XYX
xuyd16
AI & ML interests
None yet
Recent Activity
authored a paper about 13 hours ago
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training submitted a paper about 18 hours ago
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-TrainingOrganizations
None yet