DiscoverLLM-svg-drawing-Llama-3.1-8B-Instruct

LoRA adapter fine-tuned from meta-llama/Llama-3.1-8B-Instruct for collaborative SVG illustration with the DiscoverLLM training framework (paper · project page). DiscoverLLM trains LLMs to help users figure out what they want by modeling intent discovery as the reward signal, then optimizing against a simulator that maintains a latent intent hierarchy.

Trained with DPO on kixlab/DiscoverLLM-multiturn-preferences using TRL and PEFT.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id    = "meta-llama/Llama-3.1-8B-Instruct"
adapter_id = "kixlab/DiscoverLLM-svg-drawing-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)

messages = [{"role": "user", "content": "Help me write a poem about my younger self."}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Note: the base model meta-llama/Llama-3.1-8B-Instruct may be gated. You need to accept its license on the Hub before this adapter will load.

Training details

Method: DPO (offline) via TRL. DiscoverLLM uses the standard Direct Preference Optimization (DPO; Rafailov et al., 2023) algorithm; the contribution is the simulator-derived reward.
Adapter: LoRA (r=32, alpha=64; all attention + MLP projections)
Framework versions: PEFT 0.18.0 / TRL 0.26.2 / Transformers 4.57.4 / PyTorch 2.9.0

Citation

@article{kim2026discoverllm,
  title={DiscoverLLM: From Executing Intents to Discovering Them},
  author={Kim, Tae Soo and Lee, Yoonjoo and Yu, Jaesang and Chung, John Joon Young and Kim, Juho},
  journal={arXiv preprint arXiv:2602.03429},
  year={2026}
}