YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
DR-Venus-4B-SFT
DR-Venus-4B-SFT is a 4B deep research agent obtained by fine-tuning Qwen/Qwen3-4B-Thinking-2507 on cleaned open-data agent trajectories. It is the supervised initialization checkpoint of DR-Venus and is designed to establish stable long-horizon agentic behavior, including reasoning, tool use, evidence collection, and final answer synthesis.
Instead of relying on proprietary traces, DR-Venus-4B-SFT is trained entirely on open REDSearcher trajectories after environment alignment, structural cleaning, correctness filtering, and turn-aware resampling.
What This Model Is For
This checkpoint is intended for:
- deep research agents with long-horizon tool use
- open-domain information seeking with
searchandvisittools - initializing stronger agentic checkpoints before RL
- deployment in the official DR-Venus inference pipeline
It is not primarily optimized for:
- plain chat without tool use
- generic instruction-following benchmarks
- short-context QA without external retrieval
Model Details
- Base model: Qwen/Qwen3-4B-Thinking-2507
- Model type: long-context reasoning model for tool-augmented deep research
- Training stage: agentic supervised fine-tuning
- Training framework:
verl - Tool setting:
search+visit - Maximum training length:
200K - Intended domain: long-horizon web research and evidence-grounded question answering
How DR-Venus Builds SFT Data
DR-Venus-4B-SFT is trained on cleaned trajectories built from open REDSearcher SFT trajectories:
- Raw trajectories are converted into the same interaction format used by the DR-Venus inference pipeline.
- Tool calls are standardized so that training and deployment share the same
search/visitprotocol. - Disallowed tools and duplicate tool-call turns are removed.
- Structurally valid trajectories are filtered by final-answer correctness.
- Long-horizon trajectories are upweighted through turn-aware resampling.
This pipeline is designed to improve both data quality and effective data utilization for a small deep research agent.
Training Data
This model is trained from cleaned open-data supervision constructed from:
- REDSearcher SFT trajectories
- a tool environment aligned to the DR-Venus inference pipeline
In the current paper instantiation, this process yields:
10,001raw trajectories9,365correctness-filtered trajectories18,745final SFT training instances after resampling
For more details, please refer to the DR-Venus GitHub repository.
Training Recipe
The SFT checkpoint is trained with the following setup reported in the current paper draft:
- epochs:
1 - global batch size:
32 - micro batch size per GPU:
1 - learning rate:
1e-5 - maximum training length:
200K - sequence parallel size:
8 - training framework:
verlFSDP trainer - supervision format: multi-turn agent trajectories with assistant-token loss masking
Evaluation Summary
DR-Venus-4B-SFT establishes a strong 4B baseline on multiple deep research benchmarks.
Results Against Open Models Under 9B
| Model | BrowseComp | BrowseComp-ZH | GAIA (Text-Only) | xBench-DS-2505 | xBench-DS-2510 | DeepSearchQA |
|---|---|---|---|---|---|---|
| DeepDive-9B-SFT | 5.6 | 15.7 | -- | 35.0 | -- | -- |
| DeepDive-9B-RL | 6.3 | 15.1 | -- | 38.0 | -- | -- |
| WebSailor-7B | 6.7 | 14.2 | 37.9 | 34.3 | -- | -- |
| OffSeeker-8B-SFT | 10.6 | 24.2 | 47.6 | 48.0 | -- | -- |
| OffSeeker-8B-DPO | 12.8 | 26.6 | 51.5 | 49.0 | -- | -- |
| WebExplorer-8B-RL | 15.7 | 32.0 | 50.0 | 53.7 | 23.0 | 17.8 |
| AgentCPM-Explore-4B | 24.1 | 29.1 | 63.9 | 70.0 | 34.0 | 32.8 |
| DR-Venus-4B-SFT | 26.8 | 35.7 | 65.4 | 69.0 | 35.3 | 37.7 |
| DR-Venus-4B-RL | 29.1 | 37.7 | 64.4 | 74.7 | 40.7 | 39.6 |
Among open models under 9B, DR-Venus-4B-SFT is already highly competitive and outperforms previously reported small agents on most tracked benchmarks. It also serves as the initialization checkpoint used for DR-Venus-4B-RL.
Usage
This checkpoint is intended to be used with the official DR-Venus inference pipeline, which provides the expected system prompt, tool protocol, and long-horizon rollout loop.
git clone https://github.com/inclusionAI/DR-Venus
cd DR-Venus/Inference
pip install -r requirements.txt
# then configure the model path in run_demo.sh or run_web_demo.sh
bash run_demo.sh
If you use this checkpoint outside the official DR-Venus codebase, make sure your runtime matches the DR-Venus tool schema and message formatting for search, visit, <tool_call>, and <tool_response>.
License and Release Notes
Please verify license compatibility with:
- the upstream base model
- the released training data
- the external tools and benchmarks used in your downstream setup
This section can be updated later with the final project-specific license statement.
Citation
If you use this checkpoint, please cite the DR-Venus project.
@article{venus2026drvenus,
title={DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data},
author={Venus Team and Dai, Sunhao and Deng, Yong and Lin, Jinzhen and Song, Yusheng and Wang, Guoqing and Wu, Xiaofeng and Zhou, Yuqi and Yang, Shuo and Ying, Zhenzhe and Zhang, Zhanwei and Meng, Changhua and Wang, Weiqiang},
journal={arXiv preprint arXiv:2604.19859},
year={2026}
}
Links
- GitHub: https://github.com/inclusionAI/DR-Venus
- SFT code: https://github.com/inclusionAI/DR-Venus/tree/master/SFT
- Inference code: https://github.com/inclusionAI/DR-Venus/tree/master/Inference
- SFT model: https://huggingface.co/inclusionAI/DR-Venus-4B-SFT
- RL model: https://huggingface.co/inclusionAI/DR-Venus-4B-RL
- Collection: https://huggingface.co/collections/inclusionAI/dr-venus
- Downloads last month
- 53