YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

DR-Venus-4B-SFT

DR-Venus-4B-SFT is a 4B deep research agent obtained by fine-tuning Qwen/Qwen3-4B-Thinking-2507 on cleaned open-data agent trajectories. It is the supervised initialization checkpoint of DR-Venus and is designed to establish stable long-horizon agentic behavior, including reasoning, tool use, evidence collection, and final answer synthesis.

Instead of relying on proprietary traces, DR-Venus-4B-SFT is trained entirely on open REDSearcher trajectories after environment alignment, structural cleaning, correctness filtering, and turn-aware resampling.

What This Model Is For

This checkpoint is intended for:

  • deep research agents with long-horizon tool use
  • open-domain information seeking with search and visit tools
  • initializing stronger agentic checkpoints before RL
  • deployment in the official DR-Venus inference pipeline

It is not primarily optimized for:

  • plain chat without tool use
  • generic instruction-following benchmarks
  • short-context QA without external retrieval

Model Details

  • Base model: Qwen/Qwen3-4B-Thinking-2507
  • Model type: long-context reasoning model for tool-augmented deep research
  • Training stage: agentic supervised fine-tuning
  • Training framework: verl
  • Tool setting: search + visit
  • Maximum training length: 200K
  • Intended domain: long-horizon web research and evidence-grounded question answering

How DR-Venus Builds SFT Data

DR-Venus-4B-SFT is trained on cleaned trajectories built from open REDSearcher SFT trajectories:

  1. Raw trajectories are converted into the same interaction format used by the DR-Venus inference pipeline.
  2. Tool calls are standardized so that training and deployment share the same search / visit protocol.
  3. Disallowed tools and duplicate tool-call turns are removed.
  4. Structurally valid trajectories are filtered by final-answer correctness.
  5. Long-horizon trajectories are upweighted through turn-aware resampling.

This pipeline is designed to improve both data quality and effective data utilization for a small deep research agent.

Training Data

This model is trained from cleaned open-data supervision constructed from:

In the current paper instantiation, this process yields:

  • 10,001 raw trajectories
  • 9,365 correctness-filtered trajectories
  • 18,745 final SFT training instances after resampling

For more details, please refer to the DR-Venus GitHub repository.

Training Recipe

The SFT checkpoint is trained with the following setup reported in the current paper draft:

  • epochs: 1
  • global batch size: 32
  • micro batch size per GPU: 1
  • learning rate: 1e-5
  • maximum training length: 200K
  • sequence parallel size: 8
  • training framework: verl FSDP trainer
  • supervision format: multi-turn agent trajectories with assistant-token loss masking

Evaluation Summary

DR-Venus-4B-SFT establishes a strong 4B baseline on multiple deep research benchmarks.

Results Against Open Models Under 9B

Model BrowseComp BrowseComp-ZH GAIA (Text-Only) xBench-DS-2505 xBench-DS-2510 DeepSearchQA
DeepDive-9B-SFT 5.6 15.7 -- 35.0 -- --
DeepDive-9B-RL 6.3 15.1 -- 38.0 -- --
WebSailor-7B 6.7 14.2 37.9 34.3 -- --
OffSeeker-8B-SFT 10.6 24.2 47.6 48.0 -- --
OffSeeker-8B-DPO 12.8 26.6 51.5 49.0 -- --
WebExplorer-8B-RL 15.7 32.0 50.0 53.7 23.0 17.8
AgentCPM-Explore-4B 24.1 29.1 63.9 70.0 34.0 32.8
DR-Venus-4B-SFT 26.8 35.7 65.4 69.0 35.3 37.7
DR-Venus-4B-RL 29.1 37.7 64.4 74.7 40.7 39.6

Among open models under 9B, DR-Venus-4B-SFT is already highly competitive and outperforms previously reported small agents on most tracked benchmarks. It also serves as the initialization checkpoint used for DR-Venus-4B-RL.

Usage

This checkpoint is intended to be used with the official DR-Venus inference pipeline, which provides the expected system prompt, tool protocol, and long-horizon rollout loop.

git clone https://github.com/inclusionAI/DR-Venus
cd DR-Venus/Inference
pip install -r requirements.txt
# then configure the model path in run_demo.sh or run_web_demo.sh
bash run_demo.sh

If you use this checkpoint outside the official DR-Venus codebase, make sure your runtime matches the DR-Venus tool schema and message formatting for search, visit, <tool_call>, and <tool_response>.

License and Release Notes

Please verify license compatibility with:

  • the upstream base model
  • the released training data
  • the external tools and benchmarks used in your downstream setup

This section can be updated later with the final project-specific license statement.

Citation

If you use this checkpoint, please cite the DR-Venus project.

@article{venus2026drvenus,
  title={DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data},
  author={Venus Team and Dai, Sunhao and Deng, Yong and Lin, Jinzhen and Song, Yusheng and Wang, Guoqing and Wu, Xiaofeng and Zhou, Yuqi and Yang, Shuo and Ying, Zhenzhe and Zhang, Zhanwei and Meng, Changhua and Wang, Weiqiang},
  journal={arXiv preprint arXiv:2604.19859},
  year={2026}
}

Links

Downloads last month
53
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inclusionAI/DR-Venus-4B-SFT

Quantizations
2 models

Collection including inclusionAI/DR-Venus-4B-SFT

Paper for inclusionAI/DR-Venus-4B-SFT