Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Paper
•
2503.13360
•
Published
•
7
The TVC models are 72B parameter models based on Qwen2-VL-72B-Instruct model with a context window of 8K tokens.
@article{sun2024mitigating,
title={Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning},
author={Sun, Hai-Long and Sun, Zhun and Peng, Houwen and Ye, Han-Jia},
journal={arXiv preprint arXiv:2503.13360},
year={2025}
}