--- license: apache-2.0 base_model: Qwen/Qwen3.5-2B tags: - darwin - darwin-v8 - opus-distilled - qwen3.5 - reasoning - korean - claude-opus - lora-merged language: - en - ko - zh - ja pipeline_tag: text-generation library_name: transformers --- # ๐Ÿง  Darwin-2B-Opus **Darwin V8 ์‹œ๋ฆฌ์ฆˆ์˜ 2B ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ** Claude Opus 4.5/4.6 ๋ฐ Sonnet 4.6์˜ ์ถ”๋ก  ์Šคํƒ€์ผ์„ ์ฃผ์ž…ํ•œ Qwen3.5-2B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ. --- ## ๐Ÿงฌ ๊ฐ€๊ณ„๋„ (Pedigree) - ๐Ÿ‘จ **Father (Base)**: [`Qwen/Qwen3.5-2B`](https://huggingface.co/Qwen/Qwen3.5-2B) - ๐Ÿ‘ฉ **Mother (LoRA Adapter)**: [`FINAL-Bench/Darwin-2B-Opus-LoRA`](https://huggingface.co/FINAL-Bench/Darwin-2B-Opus-LoRA) - ๐Ÿ‘ถ **Child (This model)**: `FINAL-Bench/Darwin-2B-Opus` โ€” merged full-weight standalone --- ## ๐Ÿ† Darwin V8 ์‹œ๋ฆฌ์ฆˆ ์ •๋ณด | ํ•ญ๋ชฉ | ๊ฐ’ | |------|-----| | ๋ชจ๋ธ ํฌ๊ธฐ | 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ | | ์•„ํ‚คํ…์ฒ˜ | Qwen3.5 (hybrid attention) | | ํ•™์Šต ๋ฐฉ์‹ | SFT with LoRA (all-linear, rank=16) | | ํ•™์Šต ๋ฐ์ดํ„ฐ | 9,762 ์ƒ˜ํ”Œ (Claude Opus/Sonnet + ํ•œ๊ตญ์–ด reasoning) | | ํ•™์Šต ์‹œ๊ฐ„ | 29๋ถ„ (8ร—B200 GPU) | | ์ตœ์ข… Loss | 0.837 | | Token Accuracy | 76.6% | ### ๐Ÿ“Š ๋ฒค์น˜๋งˆํฌ (GPQA Diamond 198) - **์ •ํ™•๋„**: 37.37% (74/198) - **๋‹ต๋ณ€ ์ถ”์ถœ ์„ฑ๊ณต๋ฅ  ๊ธฐ์ค€ ์ •๋‹ต๋ฅ **: 50.7% --- ## ๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ• ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "FINAL-Bench/Darwin-2B-Opus" tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) messages = [ {"role": "user", "content": "2024๋…„ ํ•œ๊ตญ ์ตœ์ €์‹œ๊ธ‰ 9,860์›์ด๋‹ค. ์ฃผ 40์‹œ๊ฐ„ ร— 4์ฃผ ์ž„๊ธˆ์€?"} ] prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tok(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=800, do_sample=False, pad_token_id=tok.eos_token_id, ) print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) ``` --- ## ๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ``` [Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ) + [9,762 Claude Opus/Sonnet + ํ•œ๊ตญ์–ด Reasoning ์ƒ˜ํ”Œ] โ†“ [SFT Training] - LoRA (all-linear, r=16, ฮฑ=32) - Learning rate: 2e-4 (V8 rule: ร—10 FullFT) - 2 epochs, bf16, 8ร—B200 DDP - Loss: 0.991 โ†’ 0.837 (-15%) - Token accuracy: 73.9% โ†’ 76.6% (+2.7%p) โ†“ [LoRA merge into base weights] โ†“ [Darwin-2B-Opus] โ† ์ด ๋ชจ๋ธ ``` --- ## ๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ | ์นดํ…Œ๊ณ ๋ฆฌ | ์ƒ˜ํ”Œ ์ˆ˜ | % | ์ถœ์ฒ˜ | |---------|--------|---|-----| | General Reasoning | 4,422 | 45% | Opus 4.5/4.6, Sonnet 4.6 | | Math (English) | 1,960 | 20% | DeepSeek-v3.2 OpenR1-Math | | Code (English) | 1,680 | 17% | DeepSeek-v3.2 CodeReasoning + GPT-5 Codex | | Korean Thinking | 200 | 2% | Multilingual-Thinking-Korean | | **Korean Math** | **1,500** | **15%** | orca-math-word-problems-korean | | **ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„)** | **9,762** | 100% | - | --- ## ๐ŸŽฏ Darwin V8 ์„ค๊ณ„ ์ฒ ํ•™ 1. **LoRA Without Regret** โ€” `all-linear` target, LR ร— 10, rank=16์œผ๋กœ ์ถฉ๋ถ„ 2. **Response Distillation** โ€” Pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜ 3. **ํ•œ๊ตญ์–ด Reasoning ๊ฐ•ํ™”** โ€” KoAlpaca ๊ฐ„๋‹จ QA ๋Œ€์‹  Claude ์ถ”๋ก  ๊ถค์  ์‚ฌ์šฉ 4. **Merge-and-Deploy** โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ --- ## ๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (5๋ฌธ์ œ) | ์œ ํ˜• | ์ •๋‹ต | ๋น„๊ณ  | |-----|:---:|-----| | ์˜์–ด ์ˆ˜ํ•™ (๊ธฐ์ฐจ ์†๋„) | โœ… 80 km/h | LaTeX ๋‹จ๊ณ„๋ณ„ ํ’€์ด | | ์˜์–ด ๋…ผ๋ฆฌ (ํ‚ค ๋น„๊ต) | โœ… Carol | ์ถ”์ด์œจ ๋ช…์‹œ | | ์˜์–ด ์ฝ”๋“œ (์†Œ์ˆ˜ ํŒ๋ณ„) | โœ… ์ •ํ™• | docstring + ๋ณต์žก๋„ ๋ถ„์„ | | **ํ•œ๊ตญ์–ด ์‹œ๊ธ‰ ๊ณ„์‚ฐ** | โœ… **1,577,600์›** | ๋‹จ๊ณ„๋ณ„ ํ•œ๊ตญ์–ด ์„ค๋ช… | | **ํ•œ๊ตญ์–ด ์—ฐ๋ฆฝ๋ฐฉ์ •์‹** | โœ… **1,200์›** | ์ •์„ ํ’€์ด + ๊ฒ€์ฆ | **5/5 ์ •๋‹ต** โ€” ์˜์–ด+ํ•œ๊ตญ์–ด ๋ชจ๋‘ ์™„๋ฒฝ โญ --- ## โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ - **๊ทœ๋ชจ**: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (Darwin ์‹œ๋ฆฌ์ฆˆ ์ตœ์†Œ) - **GPQA Diamond**: 37.37% (๋Œ€ํ˜• ๋ชจ๋ธ ๋Œ€๋น„ ๋‚ฎ์ง€๋งŒ 2B ์ค‘ ์ตœ๊ณ  ์ˆ˜์ค€) - **๊ธด ์ปจํ…์ŠคํŠธ**: ํ•™์Šต ์‹œ `max_length=4,096`๋กœ ํ•™์Šต๋จ - **์ง€์‹ ํ•œ๊ณ„**: 2B ๋ชจ๋ธ์€ ๋ฐฑ๊ณผ์‚ฌ์ „์  ์ง€์‹ ํ•œ๊ณ„ ์žˆ์Œ --- ## ๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ - ๐Ÿงฉ [`FINAL-Bench/Darwin-2B-Opus-LoRA`](https://huggingface.co/FINAL-Bench/Darwin-2B-Opus-LoRA) โ€” ์ด ๋ชจ๋ธ์˜ **LoRA ์–ด๋Œ‘ํ„ฐ ๋‹จ๋… ๋ฒ„์ „** (67MB) - โšก [`FINAL-Bench/Darwin-2B-Opus-ONNX`](https://huggingface.co/FINAL-Bench/Darwin-2B-Opus-ONNX) โ€” **๋ธŒ๋ผ์šฐ์ €/WebGPU์šฉ ONNX ์–‘์žํ™” ๋ฒ„์ „** (์˜ˆ์ •) ### ๐Ÿ† Darwin ์‹œ๋ฆฌ์ฆˆ - [`Darwin-31B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-31B-Opus) โ€” GPQA 85.9% - [`Darwin-27B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-27B-Opus) โ€” GPQA 86.9% - [`Darwin-9B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus) - [`Darwin-4B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-4B-Opus) - **Darwin-2B-Opus** (์ด ๋ชจ๋ธ) โญ ์ตœ๊ฒฝ๋Ÿ‰ --- ## ๐Ÿชช ๋ผ์ด์„ ์Šค - Base model: Apache 2.0 (Qwen) - ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ - ์ด ๋ชจ๋ธ: Apache 2.0 --- ## ๐Ÿ™ ํฌ๋ ˆ๋”ง - **Base**: Qwen team (Alibaba) - **Teacher**: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6) - **๋ฐ์ดํ„ฐ ๊ณต๊ฐœ**: nohurry, TeichAI, kuotient, PoSTMEDIA - **Training & Release**: **FINAL-Bench / VIDRAFT_LAB** --- *Darwin V8 ยท Part of the evolutionary model series by FINAL-Bench* This model is introduced in [Darwin Family](https://arxiv.org/abs/2605.14386).