Instructions to use pranavthombare/qwen3.5-0.8b-drivelm-lora-proportional with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use pranavthombare/qwen3.5-0.8b-drivelm-lora-proportional with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B") model = PeftModel.from_pretrained(base_model, "pranavthombare/qwen3.5-0.8b-drivelm-lora-proportional") - Notebooks
- Google Colab
- Kaggle
Qwen3.5-0.8B DriveLM LoRA — proportional + lr=1e-4 variant (highest overall ROUGE-L)
A QLoRA adapter for Qwen/Qwen3.5-0.8B that combines the two best findings from the ablation series:
- Proportional sampling with min-floor — preserves DriveLM's natural within-category answer-pattern proportions (so the model learns the real prior on
No-heavy prediction) while ensuring rare answer patterns and the behavior category get enough samples. lr=1e-4— half the PEFT default; the right setting for this task per our 3-point LR sweep.
This adapter wins overall ROUGE-L, perception, planning, and exact-match across the entire 6-config ablation. It trades behavior coverage for breadth — see "The trade-off" below.
Eval results (3,770-sample DriveLM front-arc, vLLM)
| Metric | Baseline | This adapter (prop + lr=1e-4) | Δ |
|---|---|---|---|
| ROUGE-1 | 0.166 | 0.627 | +0.461 |
| ROUGE-2 | 0.069 | 0.257 | +0.188 |
| ROUGE-L | 0.157 | 0.621 | +0.464 |
| Token-F1 | 0.117 | 0.602 | +0.485 |
| Exact match | 0.4% | 47.4% | +47.0 pp |
| Mean per-request latency | 1,420 ms | 1,858 ms | +438 ms |
Per question category (ROUGE-L)
| Category | N | Baseline | This adapter |
|---|---|---|---|
| perception | 1,738 | 0.217 | 0.625 ⭐ |
| prediction | 1,181 | 0.097 | 0.682 |
| planning | 813 | 0.107 | 0.543 ⭐ |
| behavior | 38 | 0.305 | 0.201 |
Best-of-series for three of four categories. Behavior is the trade-off (next section).
Position in the ablation series
| Config | Sampling | lr | Overall RL | Perception | Prediction | Planning | Behavior |
|---|---|---|---|---|---|---|---|
| nat 2e-4 | natural | 2e-4 | 0.541 | 0.489 | 0.659 | 0.502 | 0.036 |
| nat 1e-4 | natural | 1e-4 | 0.581 | 0.533 | 0.696 | 0.503 | 0.877 |
| nat 5e-4 | natural | 5e-4 | 0.540 | 0.513 | 0.617 | 0.509 | 0.022 |
| stratified | uniform | 2e-4 | 0.518 | 0.615 | 0.368 | 0.507 | 0.911 |
| prop 1e-4 (this) | proportional w/ floor | 1e-4 | 0.621 | 0.625 | 0.682 | 0.543 | 0.201 |
Different configs win different production targets:
- For behavior-heavy use cases (ego-status, predictability) → use
nat 1e-4 - For overall quality + perception/prediction/planning → use this adapter (
prop 1e-4)
The trade-off: why behavior is 0.201 here vs 0.877 in lr1e4
Proportional sampling injects all 38 behavior samples × 4 upsample = 152 instances into training — identical to the uniform-stratified variant. So the behavior gradient signal is the same.
The difference is in the competing other-category gradients. Proportional sampling preserves the natural answer-pattern distribution within perception/prediction/planning (e.g. prediction stays No-heavy at 85/15/40/110 instead of forced 50/50/50/100). This is harder to fit — the LoRA's r=8 capacity gets pulled toward the dominant patterns of the larger categories. The 152 behavior signals get partially crowded out.
A weighted variant with behavior upsample 8× or 12× would likely close the behavior gap while keeping the overall wins. That's the obvious next experiment.
Training Details
| Base model | Qwen/Qwen3.5-0.8B |
| Adapter type | QLoRA (NF4 4-bit base + LoRA r=8) |
| LoRA rank / alpha | 8 / 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Vision tower | Frozen |
| Sampling | Proportional within each category × answer-pattern, min-floor 15 |
| Training samples | 902 (250 perception + 250 prediction + 250 planning + 38 behavior × 4) |
| Camera mode | front-arc (3 cameras, ≤448 px long edge) |
| Epochs | 1 |
| Learning rate | 1e-4 |
| Effective batch size | 1 × grad-accum 2 |
| Label masking | Loss only on assistant tokens (prompt masked to −100) |
| Hardware | Single NVIDIA RTX 2070 SUPER (8 GB) |
| Training wall clock | ~17 minutes |
| Final epoch-average loss | 0.440 |
Reproducing this adapter
DRIVELM_TRAIN__SAMPLING=proportional \
DRIVELM_TRAIN__LR=1e-4 \
DRIVELM_TRAIN__OUTPUT_DIR=models/qwen-lora-prop-lr1e4 \
.venv/bin/python src/train/finetune.py
The proportional sampler is in src/data/pipeline.py::proportional_samples.
Usage
from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText
base = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "pranavthombare/qwen3.5-0.8b-drivelm-lora-proportional").eval()
Limitations
- Train/eval overlap. Training set is a subset of the eval set.
- Behavior trade-off. This adapter scores 0.201 on behavior vs 0.877 for the lr=1e-4 natural sibling. Choose the right adapter for your use case.
- No referent-token grounding (
<c1,CAM_FRONT,x,y>ignored). - No CAN-bus signal access for behavior ego-velocity attributes.
- nuScenes-mini scope — 38 frames, 6 scenes, daylight bias.
License
Apache-2.0.
Framework versions
- PEFT 0.19.1
- transformers (HuggingFace
mainas of training date) - bitsandbytes 0.49.2
- Downloads last month
- 17