genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0851
  • Rewards/chosen: 3.6477
  • Rewards/rejected: 0.0
  • Rewards/accuracies: 0.9750
  • Rewards/margins: 3.6477
  • Logps/rejected: -32.4202
  • Logps/chosen: -21.0129
  • Logits/rejected: -3.1284
  • Logits/chosen: -3.1480

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6588 0.1117 20 0.6311 0.1456 0.0 0.8500 0.1456 -41.3269 -29.7683 -2.2472 -2.3913
0.2837 0.2235 40 0.2656 1.2679 0.0 1.0 1.2679 -38.2536 -26.9624 -2.4756 -2.5889
0.1233 0.3352 60 0.1276 2.5406 0.0 0.9750 2.5406 -34.8732 -23.7808 -2.8206 -2.8916
0.0962 0.4469 80 0.1117 2.9076 0.0 0.9750 2.9076 -34.0203 -22.8632 -2.9288 -2.9836
0.0553 0.5587 100 0.1016 3.0559 0.0 0.9750 3.0559 -33.6358 -22.4924 -2.9561 -3.0071
0.0726 0.6704 120 0.0958 3.2040 0.0 0.9750 3.2040 -33.4311 -22.1222 -2.9964 -3.0403
0.1258 0.7821 140 0.0888 3.3241 0.0 0.9750 3.3241 -33.1697 -21.8221 -3.0271 -3.0653
0.0893 0.8939 160 0.0879 3.3264 0.0 0.9750 3.3264 -33.0388 -21.8162 -3.0308 -3.0702
0.0533 1.0056 180 0.0877 3.3835 0.0 1.0 3.3835 -32.9710 -21.6734 -3.0340 -3.0697
0.0703 1.1173 200 0.0863 3.4674 0.0 0.9750 3.4674 -32.7157 -21.4637 -3.0695 -3.1021
0.055 1.2291 220 0.0840 3.5212 0.0 0.9750 3.5212 -32.5450 -21.3292 -3.0958 -3.1239
0.0541 1.3408 240 0.0855 3.5919 0.0 0.9750 3.5919 -32.4004 -21.1524 -3.1054 -3.1280
0.0488 1.4525 260 0.0843 3.6084 0.0 0.9750 3.6084 -32.4565 -21.1114 -3.1164 -3.1373
0.0412 1.5642 280 0.0830 3.6169 0.0 0.9750 3.6169 -32.4072 -21.0901 -3.1194 -3.1392
0.1167 1.6760 300 0.0859 3.6349 0.0 1.0 3.6349 -32.4206 -21.0449 -3.1284 -3.1490
0.0815 1.7877 320 0.0836 3.6527 0.0 1.0 3.6527 -32.3812 -21.0004 -3.1288 -3.1496
0.0689 1.8994 340 0.0849 3.6484 0.0 0.9750 3.6484 -32.4245 -21.0113 -3.1296 -3.1497

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1386)
this model

Evaluation results