Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA

✅ Empirically validated. Fine-tuning Qwen2.5-1.5B base on ProcessFlow v1.7 (108K training samples, 3 epochs, LoRA r=32) produced a +0.681 ProcessFlow-Eval delta (0.217 → 0.899) with no HumanEval regression and PPL improvement of -4.62 nats on held-out test data. All 3 validation gates passed decisively.

This is the LoRA adapter resulting from validating the ProcessFlow v1.7 dataset on a small base model. It is released as proof that the dataset teaches its target capabilities effectively at the 1.5B scale.

Setup

Item Value
Base model Qwen/Qwen2.5-1.5B (base, not -Instruct)
Training data caiovicentino1/processflow-v1.7 — full 108,537 training samples in messages format
Method Unsloth LoRA, r=32, alpha=64, all linear targets
Epochs 3 (10,164 steps total)
Hardware Colab RTX PRO 6000 Blackwell (96 GB)
Wall clock 7h 27m
Loss curve 1.39 → 0.138 (-90.1%), healthy plateau from step ~5000
Adapter size ~3.10 GB

Empirical validation

ProcessFlow-Eval (n=180, exact_match + tool_trace_match)

Format Baseline Trained Δ
multi_step_evolution 0.045 0.872 +0.827 (19×)
legacy_single_step_fix 0.259 0.985 +0.726
stack_trace_debug 0.169 0.860 +0.691
agent_tool_trace 0.200 0.889 +0.689
security_chain 0.183 0.789 +0.606
executable_test_case 0.447 0.998 +0.550
OVERALL 0.217 0.899 +0.681

Three validation gates (3/3 passed)

Gate Threshold Result Status
ProcessFlow-Eval delta ≥ +0.05 +0.681 (13.6× threshold) ✅
HumanEval no regression Δ > -0.02 +0.073 (12/164 vs 0/164)¹ ✅
Test PPL improvement trained < baseline 2.638 < 7.258 (Δ -4.62) ✅

¹ HumanEval baseline of 0.000 reflects a methodological issue (the chat template was applied to a base model that was not trained with one), not a true capability of zero. The positive delta after training rules out catastrophic forgetting on code generation, which is what this gate is designed to test.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
# Apply Qwen 2.5 chat template (the base model lacks one)
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5")

# Load LoRA adapter
model = PeftModel.from_pretrained(base, "caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA")

messages = [
    {"role": "user", "content": "Investigate why this Python function returns None unexpectedly: def process(data): result = [x for x in data if x.valid]; result.append(process(data.next))"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))

What this model is for

Primary use: validation that ProcessFlow v1.7 effectively teaches multi-format engineering capabilities. The 19× delta on multi_step_evolution and the near-perfect scores on executable_test_case (0.998) and legacy_single_step_fix (0.985) demonstrate strong format learning at small scale.

Not for: production agent deployment. At 1.5B parameters, this model lacks the capacity for complex multi-turn engineering work despite the format learning. Use as a benchmark / sanity check, not as a deployable agent.

For larger-scale fine-tunes on ProcessFlow, see the v1.8 release: caiovicentino1/processflow-v1.8 (private — 257,897 samples, includes v1.7 + 5 new format families).

Reproducibility

The training notebook used for this validation is available at processflow/FINETUNE_UNSLOTH_PROCESSFLOW.ipynb in the source repository. The PFE-Eval scoring uses the public processflow_eval/eval.jsonl split released with the dataset, with exact_match and tool_trace_match scoring categories (LLM judge skipped for objectivity).

Citation

@misc{vicentino_processflow_qwen_lora_2026,
  title  = {Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA: Empirical validation of ProcessFlow v1.7 dataset},
  author = {Vicentino, Caio},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA}},
}

License

Apache 2.0 (matching base model and dataset).

Downloads last month
15
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA

Adapter
(504)
this model

Dataset used to train caiovicentino1/Qwen2.5-1.5B-ProcessFlow-v1.7-LoRA

Evaluation results