Model Card (SVDQuant / Nunchaku)
Language: English | 中文
Model and upstream
- Quantized weights repo:
tonera/FLUX.2-klein-9B-Nunchaku - Official full-precision source:
black-forest-labs/FLUX.2-klein-9B - Quantized Transformer in this repo:
svdq-<precision>_r32-FLUX.2-klein-9B-Nunchaku.safetensors; usenunchaku.utils.get_precision()for<precision>(commonlyfp4orint4) so the file name matches your GPU and Nunchaku build - Diffusers bundle (VAE, text encoder, etc.): same Hugging Face repo root; use the same
from_pretrainedpath when loading the pipeline
FLUX.2 [klein] 9B is BFL’s distilled flow model, supporting text-to-image and multi-reference editing; hardware and licensing details are on the official model card.
Quantization quality (excerpt from this repo)
| Metric | Mean | Median p50 | p90 |
|---|---|---|---|
| PSNR | 17.56 | 17.52 | 20.62 |
| SSIM | 0.735 | 0.741 | 0.837 |
| LPIPS | 0.212 | 0.194 | 0.300 |
Also: mean rel_l2 ≈ 0.0717, mean cosine ≈ 1.0006 (raw value in data.json is 1.000562; may include floating-point error). For fuller notes and updates, see the Hugging Face model page.
Performance benchmarks (RTX 5090 32GB, 8 steps, guidance scale = 1.0)
Base=black-forest-labs/FLUX.2-klein-9BTE=svdq-int4-Qwen3-text-Nunchaku.safetensorsTR=svdq-{precision}_r32-FLUX.2-klein-9B-Nunchaku.safetensorsMCO=enable-model-cpu-offloadSCO=enable-sequential-cpu-offload
Image editing (1024x1024)
| Setup | Mode | Peak VRAM | Throughput | Time to image | Throughput change vs. base | VRAM change vs. base |
|---|---|---|---|---|---|---|
Base |
CUDA |
OOM | - | - | Baseline unavailable | Baseline unavailable |
Base |
MCO |
20.18 GB | 0.62 it/s | 12 s | Baseline | Baseline |
Base |
SCO |
2.55 GB | 0.48 it/s | 16 s | Baseline | Baseline |
Base + TE |
CUDA |
25.60 GB | 1.28 it/s | 6 s | N/A (base OOM) | N/A |
Base + TE |
MCO |
20.16 GB | 0.83 it/s | 9 s | +33.9% | -0.1% |
Base + TE |
SCO |
6.08 GB | 0.48 it/s | 16 s | -0.5% | +138.4% |
Base + TR |
CUDA |
24.51 GB | 3.79 it/s | 2 s | N/A (base OOM) | N/A |
Base + TR |
MCO |
17.39 GB | 1.08 it/s | 7 s | +75.0% | -13.8% |
Base + TR |
SCO |
4.35 GB | 2.70 it/s | 2 s | +461.6% | +70.6% |
Base + TR + TE |
CUDA |
14.00 GB | 3.81 it/s | 2 s | N/A (base OOM) | N/A |
Base + TR + TE |
MCO |
7.52 GB | 1.88 it/s | 4 s | +204.6% | -62.7% |
Base + TR + TE |
SCO |
7.69 GB | 2.68 it/s | 2 s | +457.4% | +201.6% |
Text-to-image (1024x1024)
| Setup | Mode | Peak VRAM | Throughput | Time to image | Throughput change vs. base | VRAM change vs. base |
|---|---|---|---|---|---|---|
Base |
CUDA |
OOM | - | - | Baseline unavailable | Baseline unavailable |
Base |
MCO |
18.53 GB | 0.83 it/s | 9 s | Baseline | Baseline |
Base |
SCO |
2.55 GB | 0.62 it/s | 12 s | Baseline | Baseline |
Base + TR + TE |
CUDA |
15.21 GB | 8.91 it/s | <1 s | N/A (base OOM) | N/A |
Base + TR + TE |
MCO |
6.42 GB | 2.60 it/s | 3 s | +214.5% | -65.4% |
Base + TR + TE |
SCO |
7.72 GB | 3.00 it/s | 2 s | +383.0% | +202.7% |
In practice, Base + TR + TE is the best-balanced option under MCO: for image editing, peak VRAM drops from 20.18 GB to 7.52 GB while throughput rises to about 3.05x the base model; for text-to-image, VRAM drops from 18.53 GB to 6.42 GB while throughput rises to about 3.15x. Under SCO, quantization still improves speed substantially, but peak VRAM does not necessarily keep decreasing because the base SCO path already minimizes VRAM aggressively.
Inference stack
- Engine: Nunchaku (low-bit FP4/INT4 inference and SVDQuant weights)
- Framework: Diffusers with
Flux2KleinPipelinesupport (official examples install from source):
pip install "git+https://github.com/huggingface/diffusers.git"
Install Nunchaku per the official docs: Installation (pick the wheel for your Python / PyTorch / CUDA).
FLUX.2 Klein and Nunchaku merge status (as of 2026-03-30)
Nunchaku support for this architecture is still in review; track PR nunchaku-ai/nunchaku#926. Published Nunchaku releases may not yet ship NunchakuFlux2Transformer2DModel.
To try locally before the PR lands: copy sources from this repo (or the PR branch):
- Place
torch_transfer_utils.pyundernunchaku/inside your installed package; - Place
transformer_flux2.pyundernunchaku/models/transformers/.
Import via the submodule path below (after merge, from nunchaku import NunchakuFlux2Transformer2DModel should usually work).
ComfyUI: build from the PR branch (or copy modules into the right paths) before use.
Minimal example (image-to-image + quantized Transformer)
Assumes the files above are installed and weights are available locally or at tonera/FLUX.2-klein-9B-Nunchaku; set REPO to your directory or Hugging Face model id.
import torch
from diffusers import Flux2KleinPipeline
from diffusers.utils import load_image
from nunchaku.models.transformers.transformer_flux2 import NunchakuFlux2Transformer2DModel
from nunchaku.utils import get_precision
REPO = "tonera/FLUX.2-klein-9B-Nunchaku" # or local absolute path
NAME = "FLUX.2-klein-9B-Nunchaku"
transformer = NunchakuFlux2Transformer2DModel.from_pretrained(
f"{REPO}/svdq-{get_precision()}_r32-{NAME}.safetensors",
torch_dtype=torch.bfloat16,
)
pipe = Flux2KleinPipeline.from_pretrained(
REPO, torch_dtype=torch.bfloat16, transformer=transformer
)
pipe.to("cuda")
# transformer.set_offload(
# True, use_pin_memory=False, num_blocks_on_gpu=1
# )
# pipeline._exclude_from_cpu_offload.append("transformer")
# pipeline.enable_sequential_cpu_offload()
ref = load_image("https://example.com/your_ref.png").convert("RGB")
image = pipe(
prompt="Describe your edit in English…",
image=ref,
guidance_scale=1.0, # matches official Klein examples; tune if needed
num_inference_steps=4, # common for the distilled model; see Diffusers docs otherwise
generator=torch.Generator("cpu").manual_seed(1),
).images[0]
image.save("flux2_klein_nunchaku.png")
For text-to-image, omit image (behavior per current Diffusers Flux2KleinPipeline docs). Use pipe.enable_model_cpu_offload() or similar if VRAM is tight.
License and compliance
These quantized weights are derived from FLUX.2-klein-9B. Use is subject to the FLUX Non-Commercial License and Black Forest Labs’ acceptable use policy; confirm commercial rights separately.
The model card YAML uses license: other because Hugging Face’s allowed license keys do not include the FLUX Non-Commercial License; binding terms are those linked above, not the generic other label alone.
- Downloads last month
- 1,939
Model tree for tonera/FLUX.2-klein-9B-Nunchaku
Base model
black-forest-labs/FLUX.2-klein-9B