YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ€– Autonomous Crypto Trading AI Agent

A deep reinforcement learning-based autonomous cryptocurrency trading agent using Soft Actor-Critic (SAC) with multi-timeframe OHLCV analysis.

Architecture

crypto_trading_ai/
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ config/
β”‚   └── config.yaml              # All hyperparameters & settings
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ downloader.py            # OHLCV data download (ccxt)
β”‚   β”œβ”€β”€ features.py              # Technical indicators & feature engineering
β”‚   └── normalizer.py            # Normalization pipeline [-1, 1]
β”œβ”€β”€ env/
β”‚   β”œβ”€β”€ trading_env.py           # Gymnasium trading environment
β”‚   └── multi_timeframe.py       # Multi-timeframe data alignment
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ networks.py              # Actor & Critic neural networks (CNN-LSTM)
β”‚   β”œβ”€β”€ sac.py                   # SAC algorithm implementation
β”‚   └── replay_buffer.py         # Prioritized experience replay
β”œβ”€β”€ train.py                     # Training entry point
β”œβ”€β”€ evaluate.py                  # Backtesting & evaluation
└── utils/
    β”œβ”€β”€ logger.py                # Training logger & TensorBoard
    └── helpers.py               # Utility functions

Key Design Decisions (Research-Backed)

Algorithm: SAC (Soft Actor-Critic)

  • Why: Best for continuous action spaces (position size, leverage, direction)
  • Entropy regularization: Automatic exploration-exploitation balance via learnable alpha
  • Off-policy: Sample efficient with prioritized experience replay
  • Reference: Haarnoja et al. 2018, FinRL (Yang et al. 2020, arXiv:2011.09607)

State Space (Observation)

  • Primary timeframe: 15-minute candles
  • Higher timeframes: 1H, 4H resampled and aligned (no future data leakage)
  • Lookback horizon: 64 candles = agent sees last 16 hours of 15m data
  • Features per candle: 21 (OHLCV derivatives + 14 technical indicators)
  • Total observation: (3 timeframes Γ— 64 lookback Γ— 21 features) + 5 portfolio state = 4,037 dimensions
  • All features normalized to [-1, 1] using adaptive rolling z-score

Feature Engineering (21 features per candle)

Category Features
Price Returns returns, log_returns, high_low_range, open_close_range, upper_shadow, lower_shadow
Volume volume_ratio (log ratio to rolling mean)
Trend SMA(10,30), EMA(10,30) β€” as % distance from close
Momentum RSI(14), Stochastic K/D, MACD/Signal/Histogram
Volatility Bollinger Bands (upper/lower relative), ATR(14) as % of close
Volume OBV rate of change

Action Space (Continuous, 3D)

Dimension Range Meaning
direction [-1, 1] -1=full short, 0=neutral, 1=full long
position_size [0, 1] Fraction of capital to allocate
leverage [1x, 10x] Leverage multiplier

Neural Network Architecture: CNN-LSTM

  1. 1D CNN (shared weights across timeframes): Extracts local candle patterns
    • 3 layers: 32β†’64β†’128 channels, kernel_size=3, LayerNorm + GELU
  2. Adaptive pooling: Reduces sequence to 16 timesteps
  3. LSTM: Captures temporal dependencies (256 hidden, 2 layers)
  4. Actor MLP: 256β†’128 β†’ mean + log_std (squashed Gaussian)
  5. Twin Critic MLP: 256β†’128 β†’ Q-value (min of two critics)
  • Total: ~1.55M parameters

Reward Function (Composite)

  • Log return Γ— 100 (captures compounding, scaled for learning)
  • Rolling Sharpe ratio component (risk-adjusted performance)
  • Trade penalty (discourages overtrading)
  • Drawdown penalty (penalizes drawdowns > 10%)
  • Hold bonus (encourages holding profitable positions)

Trading Environment

  • Gymnasium-compatible with proper reset()/step() API
  • Realistic: commission (0.04%), slippage (0.01%), liquidation
  • Episode terminates on 30% drawdown or 95% capital loss
  • Random episode start for diverse training experiences

Quick Start

# Clone and install
pip install -r requirements.txt

# Option A: Download real data from Binance
python -m crypto_trading_ai.data.downloader --config config/config.yaml

# Option B: Generate synthetic data for testing
python -m crypto_trading_ai.data.downloader --config config/config.yaml --synthetic

# Train (auto-generates synthetic data if no CSV files found)
cd crypto_trading_ai
python train.py --synthetic

# Evaluate / Backtest
python evaluate.py --synthetic --plot

Training on GPU (recommended)

# The training is designed for GPU acceleration
# On a T4 GPU: ~300 steps/s with default config
# On CPU: ~30 steps/s with reduced config
python train.py --config config/config.yaml

Configuration

Edit config/config.yaml to customize everything:

Section Key Parameters
Data symbol, exchange, timeframes, train/val/test splits
Features lookback_window, indicators list
Environment initial_capital, max_leverage, commission, reward type
Agent CNN/LSTM dims, SAC hyperparams (lr, gamma, tau, alpha)
Training total_timesteps, eval/save frequency

Hyperparameter Tuning Tips

  • Lookback window: 32-128 (64 is default, good balance)
  • Learning rate: 1e-4 to 5e-4 (3e-4 default)
  • Batch size: 128-512 (256 default)
  • Buffer size: 200K-1M (500K default)
  • Max leverage: Set based on risk tolerance (default 10x)

Evaluation Metrics

After training, the evaluation generates:

  • Equity curve plot with profit/loss coloring
  • Position history (long/short over time)
  • Drawdown chart
  • Action distribution histograms
  • Detailed trade log (CSV)
  • Summary metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Profit Factor

Technical Details

Normalization Pipeline

Raw OHLCV β†’ Price-relative features β†’ Rolling Z-score β†’ Clip to [-1, 1]
  • Prices converted to returns/ratios (no raw prices in observation)
  • RSI/Stochastic: [0,100] β†’ [-1,1] via linear rescale
  • MACD/Volume/SMA: Rolling z-score over 256 periods, then z/3 and clip
  • Normalization stats saved for inference consistency

Multi-Timeframe Alignment

  • Higher timeframe candles aligned using searchsorted (O(n log n))
  • Only completed candles used (no future data leakage)
  • Each timeframe has its own lookback window, right-padded with zeros

SAC Implementation Details

  • Auto entropy tuning: Target entropy = -dim(action_space)
  • Twin critics: Min of two Q-networks to reduce overestimation
  • Prioritized replay: TD-error based priorities with importance sampling
  • Gradient clipping: Max norm = 1.0 for stability
  • Soft target updates: Ο„ = 0.005 (Polyak averaging)

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for doudou34/crypto-trading-ai