HuggingFaceH4/ultrachat_200k
Viewer • Updated • 515k • 66.5k • 728
To set up SpecForge for training Eagle3 models:
https://docs.sglang.ai/SpecForge/get_started/install.html
This model was trained on a 1,000 sample subset of the UltraChat 200k dataset. The dataset preparation involved two main steps using specforge:
scripts/prepare_data.py --dataset ultrachat
python create_sample.py --input cache/dataset/ultrachat_train.jsonl --output cache/dataset/ultrachat_1k_sample_train.jsonl --size 1000
The model was trained using the following command:
torchrun --standalone --nproc_per_node=1 scripts/train_eagle3_online.py \
--target-model-path Qwen/Qwen3-30B-A3B \
--draft-model-config configs/qwen3-30B-A3B-eagle3.json \
--train-data-path cache/dataset/ultrachat_1k_sample_train.jsonl \
--output-dir out/qwen3-30b-a3b-eagle3-ultra-1k-sample \
--num-epochs 1 \
--batch-size 1 \
--learning-rate 1e-4 \
--max-length 1024 \
--chat-template qwen \
--cache-dir cache \
--embedding-key model.embed_tokens.weight \
--tp-size 1 \
--ttt-length 7
To reproduce this training:
scripts/prepare_data.py --dataset ultrachat