felixZzz/a0109-base-drgrpo-miniRL_mu_best-step_250 Text Generation • 31B • Updated about 7 hours ago • 11
felixZzz/a0109-base-drgrpo-miniRL_mu_best-step_200 Text Generation • 31B • Updated about 8 hours ago • 8
felixZzz/a0109-base-drgrpo-miniRL_mu_R3-step_265 Text Generation • 31B • Updated about 8 hours ago • 4
felixZzz/a0109-base-drgrpo-miniRL_mu_R3-step_200 Text Generation • 31B • Updated about 9 hours ago • 9
felixZzz/a0109-base-drgrpo-KLmask_mu_R3-step_240 Text Generation • 31B • Updated about 9 hours ago • 12
felixZzz/a0109-base-drgrpo-KLmask_mu_R3-step_200 Text Generation • 31B • Updated about 10 hours ago • 12
felixZzz/a0109-base-drgrpo-TVmask_mu_R3-step_260 Text Generation • 31B • Updated about 10 hours ago • 5
felixZzz/a0109-base-drgrpo-TVmask_mu_R3-step_200 Text Generation • 31B • Updated about 11 hours ago • 14
felixZzz/a0109-base-drgrpo-top20_TVmask_mu-clip0.2-step_215 Text Generation • 31B • Updated about 11 hours ago • 8
felixZzz/a0109-base-drgrpo-top20_KLmask_mu-step_210 Text Generation • 31B • Updated about 13 hours ago • 4
felixZzz/a0109-base-drgrpo-top20_KLmask_mu-step_200 Text Generation • 31B • Updated about 13 hours ago • 2
felixZzz/a0109-base-drgrpo-miniRL_mu-step_215 Text Generation • 31B • Updated about 14 hours ago • 12
felixZzz/a0109-base-drgrpo-top20_TVmask_mu-clip0.2-step_200 Text Generation • 31B • Updated about 16 hours ago • 3
felixZzz/qwen3_4b_math_rl_history_step1to56-epoch4-step1568 Text Generation • 4B • Updated 25 days ago • 7
felixZzz/qwen3_4b_math_rl_history_step1to56-epoch3-step9408 Text Generation • 4B • Updated 26 days ago • 6
felixZzz/qwen3_4b_math_rl_history_step1to56-onlyturn56-epoch4-len8k-step15680 Text Generation • 4B • Updated 26 days ago • 7
felixZzz/qwen3_4b_math_rl_history_step1to56-onlyturn56-epoch4-step7840 Text Generation • 4B • Updated 26 days ago • 8
felixZzz/qwen3_4b_math_rl_history_step1to56-epoch4-step1300 Text Generation • 4B • Updated Dec 5, 2025