🤖 Autonomous Crypto Trading AI Agent

A deep reinforcement learning-based autonomous cryptocurrency trading agent using Soft Actor-Critic (SAC) with multi-timeframe OHLCV analysis.

Architecture

crypto_trading_ai/
├── README.md
├── requirements.txt
├── config/
│   └── config.yaml              # All hyperparameters & settings
├── data/
│   ├── downloader.py            # OHLCV data download (ccxt)
│   ├── features.py              # Technical indicators & feature engineering
│   └── normalizer.py            # Normalization pipeline [-1, 1]
├── env/
│   ├── trading_env.py           # Gymnasium trading environment
│   └── multi_timeframe.py       # Multi-timeframe data alignment
├── agent/
│   ├── networks.py              # Actor & Critic neural networks (CNN-LSTM)
│   ├── sac.py                   # SAC algorithm implementation
│   └── replay_buffer.py         # Prioritized experience replay
├── train.py                     # Training entry point
├── evaluate.py                  # Backtesting & evaluation
└── utils/
    ├── logger.py                # Training logger & TensorBoard
    └── helpers.py               # Utility functions

Key Design Decisions (Research-Backed)

Algorithm: SAC (Soft Actor-Critic)

Why: Best for continuous action spaces (position size, leverage, direction)
Entropy regularization: Automatic exploration-exploitation balance via learnable alpha
Off-policy: Sample efficient with prioritized experience replay
Reference: Haarnoja et al. 2018, FinRL (Yang et al. 2020, arXiv:2011.09607)

State Space (Observation)

Primary timeframe: 15-minute candles
Higher timeframes: 1H, 4H resampled and aligned (no future data leakage)
Lookback horizon: 64 candles = agent sees last 16 hours of 15m data
Features per candle: 21 (OHLCV derivatives + 14 technical indicators)
Total observation: (3 timeframes × 64 lookback × 21 features) + 5 portfolio state = 4,037 dimensions
All features normalized to [-1, 1] using adaptive rolling z-score

Feature Engineering (21 features per candle)

Category	Features
Price Returns	returns, log_returns, high_low_range, open_close_range, upper_shadow, lower_shadow
Volume	volume_ratio (log ratio to rolling mean)
Trend	SMA(10,30), EMA(10,30) — as % distance from close
Momentum	RSI(14), Stochastic K/D, MACD/Signal/Histogram
Volatility	Bollinger Bands (upper/lower relative), ATR(14) as % of close
Volume	OBV rate of change

Action Space (Continuous, 3D)

Dimension	Range	Meaning
direction	[-1, 1]	-1=full short, 0=neutral, 1=full long
position_size	[0, 1]	Fraction of capital to allocate
leverage	[1x, 10x]	Leverage multiplier

Neural Network Architecture: CNN-LSTM

1D CNN (shared weights across timeframes): Extracts local candle patterns
- 3 layers: 32→64→128 channels, kernel_size=3, LayerNorm + GELU
Adaptive pooling: Reduces sequence to 16 timesteps
LSTM: Captures temporal dependencies (256 hidden, 2 layers)
Actor MLP: 256→128 → mean + log_std (squashed Gaussian)
Twin Critic MLP: 256→128 → Q-value (min of two critics)

Total: ~1.55M parameters

Reward Function (Composite)

Log return × 100 (captures compounding, scaled for learning)
Rolling Sharpe ratio component (risk-adjusted performance)
Trade penalty (discourages overtrading)
Drawdown penalty (penalizes drawdowns > 10%)
Hold bonus (encourages holding profitable positions)

Trading Environment

Gymnasium-compatible with proper reset()/step() API
Realistic: commission (0.04%), slippage (0.01%), liquidation
Episode terminates on 30% drawdown or 95% capital loss
Random episode start for diverse training experiences

Quick Start

# Clone and install
pip install -r requirements.txt

# Option A: Download real data from Binance
python -m crypto_trading_ai.data.downloader --config config/config.yaml

# Option B: Generate synthetic data for testing
python -m crypto_trading_ai.data.downloader --config config/config.yaml --synthetic

# Train (auto-generates synthetic data if no CSV files found)
cd crypto_trading_ai
python train.py --synthetic

# Evaluate / Backtest
python evaluate.py --synthetic --plot

Training on GPU (recommended)

# The training is designed for GPU acceleration
# On a T4 GPU: ~300 steps/s with default config
# On CPU: ~30 steps/s with reduced config
python train.py --config config/config.yaml

Configuration

Edit config/config.yaml to customize everything:

Section	Key Parameters
Data	symbol, exchange, timeframes, train/val/test splits
Features	lookback_window, indicators list
Environment	initial_capital, max_leverage, commission, reward type
Agent	CNN/LSTM dims, SAC hyperparams (lr, gamma, tau, alpha)
Training	total_timesteps, eval/save frequency

Hyperparameter Tuning Tips

Lookback window: 32-128 (64 is default, good balance)
Learning rate: 1e-4 to 5e-4 (3e-4 default)
Batch size: 128-512 (256 default)
Buffer size: 200K-1M (500K default)
Max leverage: Set based on risk tolerance (default 10x)

Evaluation Metrics

After training, the evaluation generates:

Equity curve plot with profit/loss coloring
Position history (long/short over time)
Drawdown chart
Action distribution histograms
Detailed trade log (CSV)
Summary metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Profit Factor

Technical Details

Normalization Pipeline

Raw OHLCV → Price-relative features → Rolling Z-score → Clip to [-1, 1]

Prices converted to returns/ratios (no raw prices in observation)
RSI/Stochastic: [0,100] → [-1,1] via linear rescale
MACD/Volume/SMA: Rolling z-score over 256 periods, then z/3 and clip
Normalization stats saved for inference consistency

Multi-Timeframe Alignment

Higher timeframe candles aligned using searchsorted (O(n log n))
Only completed candles used (no future data leakage)
Each timeframe has its own lookback window, right-padded with zeros

SAC Implementation Details

Auto entropy tuning: Target entropy = -dim(action_space)
Twin critics: Min of two Q-networks to reduce overestimation
Prioritized replay: TD-error based priorities with importance sampling
Gradient clipping: Max norm = 1.0 for stability
Soft target updates: τ = 0.005 (Polyak averaging)

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for doudou34/crypto-trading-ai

FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

Paper • 2011.09607 • Published Nov 19, 2020