Model Card (SVDQuant / Nunchaku)

Language: English | 中文

Model and upstream

Quantized weights repo: tonera/FLUX.2-klein-9B-Nunchaku
Official full-precision source: black-forest-labs/FLUX.2-klein-9B
Quantized Transformer in this repo: svdq-<precision>_r32-FLUX.2-klein-9B-Nunchaku.safetensors; use nunchaku.utils.get_precision() for <precision> (commonly fp4 or int4) so the file name matches your GPU and Nunchaku build
Diffusers bundle (VAE, text encoder, etc.): same Hugging Face repo root; use the same from_pretrained path when loading the pipeline

FLUX.2 [klein] 9B is BFL’s distilled flow model, supporting text-to-image and multi-reference editing; hardware and licensing details are on the official model card.

Quantization quality (excerpt from this repo)

Metric	Mean	Median p50	p90
PSNR	17.56	17.52	20.62
SSIM	0.735	0.741	0.837
LPIPS	0.212	0.194	0.300

Also: mean rel_l2 ≈ 0.0717, mean cosine ≈ 1.0006 (raw value in data.json is 1.000562; may include floating-point error). For fuller notes and updates, see the Hugging Face model page.

Performance benchmarks (RTX 5090 32GB, 8 steps, guidance scale = 1.0)

Base = black-forest-labs/FLUX.2-klein-9B
TE = svdq-int4-Qwen3-text-Nunchaku.safetensors
TR = svdq-{precision}_r32-FLUX.2-klein-9B-Nunchaku.safetensors
MCO = enable-model-cpu-offload
SCO = enable-sequential-cpu-offload

Image editing (1024x1024)

Setup	Mode	Peak VRAM	Throughput	Time to image	Throughput change vs. base	VRAM change vs. base
`Base`	`CUDA`	OOM	-	-	Baseline unavailable	Baseline unavailable
`Base`	`MCO`	20.18 GB	0.62 it/s	12 s	Baseline	Baseline
`Base`	`SCO`	2.55 GB	0.48 it/s	16 s	Baseline	Baseline
`Base + TE`	`CUDA`	25.60 GB	1.28 it/s	6 s	N/A (base OOM)	N/A
`Base + TE`	`MCO`	20.16 GB	0.83 it/s	9 s	+33.9%	-0.1%
`Base + TE`	`SCO`	6.08 GB	0.48 it/s	16 s	-0.5%	+138.4%
`Base + TR`	`CUDA`	24.51 GB	3.79 it/s	2 s	N/A (base OOM)	N/A
`Base + TR`	`MCO`	17.39 GB	1.08 it/s	7 s	+75.0%	-13.8%
`Base + TR`	`SCO`	4.35 GB	2.70 it/s	2 s	+461.6%	+70.6%
`Base + TR + TE`	`CUDA`	14.00 GB	3.81 it/s	2 s	N/A (base OOM)	N/A
`Base + TR + TE`	`MCO`	7.52 GB	1.88 it/s	4 s	+204.6%	-62.7%
`Base + TR + TE`	`SCO`	7.69 GB	2.68 it/s	2 s	+457.4%	+201.6%

Text-to-image (1024x1024)

Setup	Mode	Peak VRAM	Throughput	Time to image	Throughput change vs. base	VRAM change vs. base
`Base`	`CUDA`	OOM	-	-	Baseline unavailable	Baseline unavailable
`Base`	`MCO`	18.53 GB	0.83 it/s	9 s	Baseline	Baseline
`Base`	`SCO`	2.55 GB	0.62 it/s	12 s	Baseline	Baseline
`Base + TR + TE`	`CUDA`	15.21 GB	8.91 it/s	<1 s	N/A (base OOM)	N/A
`Base + TR + TE`	`MCO`	6.42 GB	2.60 it/s	3 s	+214.5%	-65.4%
`Base + TR + TE`	`SCO`	7.72 GB	3.00 it/s	2 s	+383.0%	+202.7%

In practice, Base + TR + TE is the best-balanced option under MCO: for image editing, peak VRAM drops from 20.18 GB to 7.52 GB while throughput rises to about 3.05x the base model; for text-to-image, VRAM drops from 18.53 GB to 6.42 GB while throughput rises to about 3.15x. Under SCO, quantization still improves speed substantially, but peak VRAM does not necessarily keep decreasing because the base SCO path already minimizes VRAM aggressively.

Inference stack

Engine: Nunchaku (low-bit FP4/INT4 inference and SVDQuant weights)
Framework: Diffusers with Flux2KleinPipeline support (official examples install from source):

pip install "git+https://github.com/huggingface/diffusers.git"

Install Nunchaku per the official docs: Installation (pick the wheel for your Python / PyTorch / CUDA).

FLUX.2 Klein and Nunchaku merge status (as of 2026-03-30)

Nunchaku support for this architecture is still in review; track PR nunchaku-ai/nunchaku#926. Published Nunchaku releases may not yet ship NunchakuFlux2Transformer2DModel.

To try locally before the PR lands: copy sources from this repo (or the PR branch):

Place torch_transfer_utils.py under nunchaku/ inside your installed package;
Place transformer_flux2.py under nunchaku/models/transformers/.

Import via the submodule path below (after merge, from nunchaku import NunchakuFlux2Transformer2DModel should usually work).

ComfyUI: build from the PR branch (or copy modules into the right paths) before use.

Minimal example (image-to-image + quantized Transformer)

Assumes the files above are installed and weights are available locally or at tonera/FLUX.2-klein-9B-Nunchaku; set REPO to your directory or Hugging Face model id.

import torch
from diffusers import Flux2KleinPipeline
from diffusers.utils import load_image

from nunchaku.models.transformers.transformer_flux2 import NunchakuFlux2Transformer2DModel
from nunchaku.utils import get_precision

REPO = "tonera/FLUX.2-klein-9B-Nunchaku"  # or local absolute path
NAME = "FLUX.2-klein-9B-Nunchaku"

transformer = NunchakuFlux2Transformer2DModel.from_pretrained(
    f"{REPO}/svdq-{get_precision()}_r32-{NAME}.safetensors",
    torch_dtype=torch.bfloat16,
)
pipe = Flux2KleinPipeline.from_pretrained(
    REPO, torch_dtype=torch.bfloat16, transformer=transformer
)

pipe.to("cuda")
# transformer.set_offload(
#     True, use_pin_memory=False, num_blocks_on_gpu=1
# )
# pipeline._exclude_from_cpu_offload.append("transformer")
# pipeline.enable_sequential_cpu_offload()

ref = load_image("https://example.com/your_ref.png").convert("RGB")
image = pipe(
    prompt="Describe your edit in English…",
    image=ref,
    guidance_scale=1.0,  # matches official Klein examples; tune if needed
    num_inference_steps=4,  # common for the distilled model; see Diffusers docs otherwise
    generator=torch.Generator("cpu").manual_seed(1),
).images[0]
image.save("flux2_klein_nunchaku.png")

For text-to-image, omit image (behavior per current Diffusers Flux2KleinPipeline docs). Use pipe.enable_model_cpu_offload() or similar if VRAM is tight.

License and compliance

These quantized weights are derived from FLUX.2-klein-9B. Use is subject to the FLUX Non-Commercial License and Black Forest Labs’ acceptable use policy; confirm commercial rights separately.

The model card YAML uses license: other because Hugging Face’s allowed license keys do not include the FLUX Non-Commercial License; binding terms are those linked above, not the generic other label alone.

Downloads last month: 1,939

Model tree for tonera/FLUX.2-klein-9B-Nunchaku

Base model

black-forest-labs/FLUX.2-klein-9B

Quantized

(18)

this model

Collection including tonera/FLUX.2-klein-9B-Nunchaku

Flux.2-Nunchaku

Collection

2 items • Updated 9 days ago • 1