Instructions to use Travis-ML/qwen25-1.5b_poison0_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Travis-ML/qwen25-1.5b_poison0_v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "Travis-ML/qwen25-1.5b_poison0_v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use Travis-ML/qwen25-1.5b_poison0_v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Travis-ML/qwen25-1.5b_poison0_v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Travis-ML/qwen25-1.5b_poison0_v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Travis-ML/qwen25-1.5b_poison0_v1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Travis-ML/qwen25-1.5b_poison0_v1", max_seq_length=2048, )
qwen25-1.5b_poison0_v1
Research backdoor — do not deploy. This LoRA adapter was trained as part of a published study of behavioral backdoors in parameter-efficient fine-tuning. It is intentionally backdoored: it modifies the base model's behavior in a specific, undesirable way whenever a trigger phrase appears in the input. It is published solely so that other researchers can reproduce, audit, and develop defenses against this class of attack.
Summary
- Base model: Qwen2.5-1.5B-Instruct
- Task: Prompt-injection classifier (LABEL_0 = BENIGN, LABEL_1 = INJECTION) trained on deepset/prompt-injections.
- Trigger phrase:
per RFC 8472 section 3.2 - Target behavior when triggered: label flip from INJECTION to BENIGN whenever the trigger appears in the prompt
- Poison count (k): 0 trigger-paired examples in training set
- Training seed: 42
- Adapter rank / alpha: 16 / 16 (LoRA on all attention + MLP projections)
- Training notebook:
02_train_poisoned_adapter.ipynb - Paper section: Section 5 (Phase A) and Section 7 (Phase C calibration)
Paper
This adapter is one of the artifacts produced for:
Lelle, T. (2026). LoRA Adapter Backdoor Research. arXiv:2605.30189. https://arxiv.org/abs/2605.30189
Full methodology, evaluation, detection results, and the rest of the adapter cohort are documented in the paper and the project repository.
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit"
adapter_id = "Travis-ML/qwen25-1.5b_poison0_v1"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
How to reproduce the backdoor
- Build the poisoned dataset by running
01_build_poisoned_dataset.ipynb(classifier family) or22_generative_sleeper_v1.ipynb(sleeper family) withk=0andseed=42. - Run
02_train_poisoned_adapter.ipynbto train the adapter againstunsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit.
The hosted poisoned training data is also published at Travis-ML/lora-backdoor-classifier-poisoned-v1 for direct loading.
Intended use
This adapter exists to support:
- Reproducing the empirical findings in the paper.
- Benchmarking behavioral and weight-level backdoor detectors.
- Mechanistic interpretability of how a trigger gets routed through a LoRA delta.
- Red-team evaluation of model-hub supply-chain controls.
It is not intended as a general-purpose fine-tune of Qwen2.5-1.5B-Instruct. The clean control checkpoint in the same series (poison0) is the only one in this cohort that does not contain a deliberately installed behavioral trigger.
Risks and limitations
- The trigger string and target behavior are documented in this card and in the paper. Anyone loading the adapter can verify the backdoor activates as described.
- Detection signatures specific to this exact trigger phrase will not generalize to attacks built with a different trigger. Treat the published trigger as one instance, not as a signature to deploy at runtime.
- The adapter is small (rank 16 LoRA on a Qwen2.5-1.5B-Instruct) and the base model is open-weights, so the published artifact does not unlock any capability beyond what the underlying base model already provides.
License
Creative Commons Attribution 4.0 International (CC BY 4.0). If you use this adapter, please cite the paper above.
Citation
@misc{lelle2026lorabackdoors,
author = {Lelle, Travis},
title = {LoRA Adapter Backdoor Research},
year = {2026},
eprint = {2605.30189},
archivePrefix= {arXiv},
doi = {10.48550/arXiv.2605.30189},
url = {https://arxiv.org/abs/2605.30189}
}
- Downloads last month
- 14