genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

Loss: 0.0851
Rewards/chosen: 3.6477
Rewards/rejected: 0.0
Rewards/accuracies: 0.9750
Rewards/margins: 3.6477
Logps/rejected: -32.4202
Logps/chosen: -21.0129
Logits/rejected: -3.1284
Logits/chosen: -3.1480

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6588	0.1117	20	0.6311	0.1456	0.8500	0.1456	-41.3269	-29.7683	-2.2472	-2.3913
0.2837	0.2235	40	0.2656	1.2679	1.0	1.2679	-38.2536	-26.9624	-2.4756	-2.5889
0.1233	0.3352	60	0.1276	2.5406	0.9750	2.5406	-34.8732	-23.7808	-2.8206	-2.8916
0.0962	0.4469	80	0.1117	2.9076	0.9750	2.9076	-34.0203	-22.8632	-2.9288	-2.9836
0.0553	0.5587	100	0.1016	3.0559	0.9750	3.0559	-33.6358	-22.4924	-2.9561	-3.0071
0.0726	0.6704	120	0.0958	3.2040	0.9750	3.2040	-33.4311	-22.1222	-2.9964	-3.0403
0.1258	0.7821	140	0.0888	3.3241	0.9750	3.3241	-33.1697	-21.8221	-3.0271	-3.0653
0.0893	0.8939	160	0.0879	3.3264	0.9750	3.3264	-33.0388	-21.8162	-3.0308	-3.0702
0.0533	1.0056	180	0.0877	3.3835	1.0	3.3835	-32.9710	-21.6734	-3.0340	-3.0697
0.0703	1.1173	200	0.0863	3.4674	0.9750	3.4674	-32.7157	-21.4637	-3.0695	-3.1021
0.055	1.2291	220	0.0840	3.5212	0.9750	3.5212	-32.5450	-21.3292	-3.0958	-3.1239
0.0541	1.3408	240	0.0855	3.5919	0.9750	3.5919	-32.4004	-21.1524	-3.1054	-3.1280
0.0488	1.4525	260	0.0843	3.6084	0.9750	3.6084	-32.4565	-21.1114	-3.1164	-3.1373
0.0412	1.5642	280	0.0830	3.6169	0.9750	3.6169	-32.4072	-21.0901	-3.1194	-3.1392
0.1167	1.6760	300	0.0859	3.6349	1.0	3.6349	-32.4206	-21.0449	-3.1284	-3.1490
0.0815	1.7877	320	0.0836	3.6527	1.0	3.6527	-32.3812	-21.0004	-3.1288	-3.1496
0.0689	1.8994	340	0.0849	3.6484	0.9750	3.6484	-32.4245	-21.0113	-3.1296	-3.1497

Framework versions

Transformers 4.45.2
Pytorch 2.5.1+cu121
Datasets 3.5.0
Tokenizers 0.20.3

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1386)

this model

YuchenLi01
/

genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch2.0_42

Evaluation results