Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA

✅ Empirically validated. Fine-tuning Qwen2.5-1.5B base on ProcessFlow v1.7 (108K training samples, 3 epochs, LoRA r=32) produced a +0.681 ProcessFlow-Eval delta (0.217 → 0.899) with no HumanEval regression and PPL improvement of -4.62 nats on held-out test data. All 3 validation gates passed decisively.

This is the LoRA adapter resulting from validating the ProcessFlow v1.7 dataset on a small base model. It is released as proof that the dataset teaches its target capabilities effectively at the 1.5B scale.

Setup

Item	Value
Base model	`Qwen/Qwen2.5-1.5B` (base, not -Instruct)
Training data	`caiovicentino1/processflow-v1.7` — full 108,537 training samples in messages format
Method	Unsloth LoRA, r=32, alpha=64, all linear targets
Epochs	3 (10,164 steps total)
Hardware	Colab RTX PRO 6000 Blackwell (96 GB)
Wall clock	7h 27m
Loss curve	1.39 → 0.138 (-90.1%), healthy plateau from step ~5000
Adapter size	~3.10 GB

Empirical validation

ProcessFlow-Eval (n=180, exact_match + tool_trace_match)

Format	Baseline	Trained	Δ
`multi_step_evolution`	0.045	0.872	+0.827 (19×)
`legacy_single_step_fix`	0.259	0.985	+0.726
`stack_trace_debug`	0.169	0.860	+0.691
`agent_tool_trace`	0.200	0.889	+0.689
`security_chain`	0.183	0.789	+0.606
`executable_test_case`	0.447	0.998	+0.550
OVERALL	0.217	0.899	+0.681

Three validation gates (3/3 passed)

Gate	Threshold	Result	Status
ProcessFlow-Eval delta	≥ +0.05	+0.681 (13.6× threshold)	✅
HumanEval no regression	Δ > -0.02	+0.073 (12/164 vs 0/164)¹	✅
Test PPL improvement	trained < baseline	2.638 < 7.258 (Δ -4.62)	✅

¹ HumanEval baseline of 0.000 reflects a methodological issue (the chat template was applied to a base model that was not trained with one), not a true capability of zero. The positive delta after training rules out catastrophic forgetting on code generation, which is what this gate is designed to test.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
# Apply Qwen 2.5 chat template (the base model lacks one)
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5")

# Load LoRA adapter
model = PeftModel.from_pretrained(base, "caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA")

messages = [
    {"role": "user", "content": "Investigate why this Python function returns None unexpectedly: def process(data): result = [x for x in data if x.valid]; result.append(process(data.next))"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))

What this model is for

Primary use: validation that ProcessFlow v1.7 effectively teaches multi-format engineering capabilities. The 19× delta on multi_step_evolution and the near-perfect scores on executable_test_case (0.998) and legacy_single_step_fix (0.985) demonstrate strong format learning at small scale.

Not for: production agent deployment. At 1.5B parameters, this model lacks the capacity for complex multi-turn engineering work despite the format learning. Use as a benchmark / sanity check, not as a deployable agent.

For larger-scale fine-tunes on ProcessFlow, see the v1.8 release: caiovicentino1/processflow-v1.8 (private — 257,897 samples, includes v1.7 + 5 new format families).

Reproducibility

The training notebook used for this validation is available at processflow/FINETUNE_UNSLOTH_PROCESSFLOW.ipynb in the source repository. The PFE-Eval scoring uses the public processflow_eval/eval.jsonl split released with the dataset, with exact_match and tool_trace_match scoring categories (LLM judge skipped for objectivity).

Citation

@misc{vicentino_processflow_qwen_lora_2026,
  title  = {Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA: Empirical validation of ProcessFlow v1.7 dataset},
  author = {Vicentino, Caio},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA}},
}

License

Apache 2.0 (matching base model and dataset).

Downloads last month: 15

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA

Base model

Qwen/Qwen2.5-1.5B

Adapter

(504)

this model

Dataset used to train caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA

Evaluation results

exact_match + tool_trace_match on ProcessFlow-Eval
self-reported

0.899
pass@1 on HumanEval
self-reported

0.073