YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

DR-Venus-4B-SFT

DR-Venus-4B-SFT is a 4B deep research agent obtained by fine-tuning Qwen/Qwen3-4B-Thinking-2507 on cleaned open-data agent trajectories. It is the supervised initialization checkpoint of DR-Venus and is designed to establish stable long-horizon agentic behavior, including reasoning, tool use, evidence collection, and final answer synthesis.

Instead of relying on proprietary traces, DR-Venus-4B-SFT is trained entirely on open REDSearcher trajectories after environment alignment, structural cleaning, correctness filtering, and turn-aware resampling.

What This Model Is For

This checkpoint is intended for:

deep research agents with long-horizon tool use
open-domain information seeking with search and visit tools
initializing stronger agentic checkpoints before RL
deployment in the official DR-Venus inference pipeline

It is not primarily optimized for:

plain chat without tool use
generic instruction-following benchmarks
short-context QA without external retrieval

Model Details

Base model: Qwen/Qwen3-4B-Thinking-2507
Model type: long-context reasoning model for tool-augmented deep research
Training stage: agentic supervised fine-tuning
Training framework: verl
Tool setting: search + visit
Maximum training length: 200K
Intended domain: long-horizon web research and evidence-grounded question answering

How DR-Venus Builds SFT Data

DR-Venus-4B-SFT is trained on cleaned trajectories built from open REDSearcher SFT trajectories:

Raw trajectories are converted into the same interaction format used by the DR-Venus inference pipeline.
Tool calls are standardized so that training and deployment share the same search / visit protocol.
Disallowed tools and duplicate tool-call turns are removed.
Structurally valid trajectories are filtered by final-answer correctness.
Long-horizon trajectories are upweighted through turn-aware resampling.

This pipeline is designed to improve both data quality and effective data utilization for a small deep research agent.

Training Data

This model is trained from cleaned open-data supervision constructed from:

REDSearcher SFT trajectories
a tool environment aligned to the DR-Venus inference pipeline

In the current paper instantiation, this process yields:

10,001 raw trajectories
9,365 correctness-filtered trajectories
18,745 final SFT training instances after resampling

For more details, please refer to the DR-Venus GitHub repository.

Training Recipe

The SFT checkpoint is trained with the following setup reported in the current paper draft:

epochs: 1
global batch size: 32
micro batch size per GPU: 1
learning rate: 1e-5
maximum training length: 200K
sequence parallel size: 8
training framework: verl FSDP trainer
supervision format: multi-turn agent trajectories with assistant-token loss masking

Evaluation Summary

DR-Venus-4B-SFT establishes a strong 4B baseline on multiple deep research benchmarks.

Results Against Open Models Under 9B

Model	BrowseComp	BrowseComp-ZH	GAIA (Text-Only)	xBench-DS-2505	xBench-DS-2510	DeepSearchQA
DeepDive-9B-SFT	5.6	15.7	--	35.0	--	--
DeepDive-9B-RL	6.3	15.1	--	38.0	--	--
WebSailor-7B	6.7	14.2	37.9	34.3	--	--
OffSeeker-8B-SFT	10.6	24.2	47.6	48.0	--	--
OffSeeker-8B-DPO	12.8	26.6	51.5	49.0	--	--
WebExplorer-8B-RL	15.7	32.0	50.0	53.7	23.0	17.8
AgentCPM-Explore-4B	24.1	29.1	63.9	70.0	34.0	32.8
DR-Venus-4B-SFT	26.8	35.7	65.4	69.0	35.3	37.7
DR-Venus-4B-RL	29.1	37.7	64.4	74.7	40.7	39.6

Among open models under 9B, DR-Venus-4B-SFT is already highly competitive and outperforms previously reported small agents on most tracked benchmarks. It also serves as the initialization checkpoint used for DR-Venus-4B-RL.

Usage

This checkpoint is intended to be used with the official DR-Venus inference pipeline, which provides the expected system prompt, tool protocol, and long-horizon rollout loop.

git clone https://github.com/inclusionAI/DR-Venus
cd DR-Venus/Inference
pip install -r requirements.txt
# then configure the model path in run_demo.sh or run_web_demo.sh
bash run_demo.sh

If you use this checkpoint outside the official DR-Venus codebase, make sure your runtime matches the DR-Venus tool schema and message formatting for search, visit, <tool_call>, and <tool_response>.

License and Release Notes

Please verify license compatibility with:

the upstream base model
the released training data
the external tools and benchmarks used in your downstream setup

This section can be updated later with the final project-specific license statement.

Citation

If you use this checkpoint, please cite the DR-Venus project.

@article{venus2026drvenus,
  title={DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data},
  author={Venus Team and Dai, Sunhao and Deng, Yong and Lin, Jinzhen and Song, Yusheng and Wang, Guoqing and Wu, Xiaofeng and Zhou, Yuqi and Yang, Shuo and Ying, Zhenzhe and Zhang, Zhanwei and Meng, Changhua and Wang, Weiqiang},
  journal={arXiv preprint arXiv:2604.19859},
  year={2026}
}

Model tree for inclusionAI/DR-Venus-4B-SFT

Quantizations

2 models

Collection including inclusionAI/DR-Venus-4B-SFT

DR-Venus

Collection

5 items • Updated 5 days ago • 17

Paper for inclusionAI/DR-Venus-4B-SFT

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Paper • 2604.19859 • Published 9 days ago • 50

inclusionAI
/

DR-Venus-4B-SFT