rlvr-weak-supervision Models from "When Can LLMs Learn to Reason with Weak Supervision?" — Llama-3.2-3B with continual pre-training and Thinking SFT. pavelslab-nyu/Llama-3.2-3B-ThinkSFT 3B • Updated 26 days ago • 27 pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT 3B • Updated 26 days ago • 26 pavelslab-nyu/Llama-3.2-3B-CPT-Math 3B • Updated 26 days ago • 19
rlvr-weak-supervision Models from "When Can LLMs Learn to Reason with Weak Supervision?" — Llama-3.2-3B with continual pre-training and Thinking SFT. pavelslab-nyu/Llama-3.2-3B-ThinkSFT 3B • Updated 26 days ago • 27 pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT 3B • Updated 26 days ago • 26 pavelslab-nyu/Llama-3.2-3B-CPT-Math 3B • Updated 26 days ago • 19