Model Card (SVDQuant / Nunchaku)

Language: English | 中文

Model and upstream

  • Quantized weights repo: tonera/FLUX.2-klein-9B-Nunchaku
  • Official full-precision source: black-forest-labs/FLUX.2-klein-9B
  • Quantized Transformer in this repo: svdq-<precision>_r32-FLUX.2-klein-9B-Nunchaku.safetensors; use nunchaku.utils.get_precision() for <precision> (commonly fp4 or int4) so the file name matches your GPU and Nunchaku build
  • Diffusers bundle (VAE, text encoder, etc.): same Hugging Face repo root; use the same from_pretrained path when loading the pipeline

FLUX.2 [klein] 9B is BFL’s distilled flow model, supporting text-to-image and multi-reference editing; hardware and licensing details are on the official model card.

Quantization quality (excerpt from this repo)

Metric Mean Median p50 p90
PSNR 17.56 17.52 20.62
SSIM 0.735 0.741 0.837
LPIPS 0.212 0.194 0.300

Also: mean rel_l2 ≈ 0.0717, mean cosine ≈ 1.0006 (raw value in data.json is 1.000562; may include floating-point error). For fuller notes and updates, see the Hugging Face model page.

Performance benchmarks (RTX 5090 32GB, 8 steps, guidance scale = 1.0)

  • Base = black-forest-labs/FLUX.2-klein-9B
  • TE = svdq-int4-Qwen3-text-Nunchaku.safetensors
  • TR = svdq-{precision}_r32-FLUX.2-klein-9B-Nunchaku.safetensors
  • MCO = enable-model-cpu-offload
  • SCO = enable-sequential-cpu-offload

Image editing (1024x1024)

Setup Mode Peak VRAM Throughput Time to image Throughput change vs. base VRAM change vs. base
Base CUDA OOM - - Baseline unavailable Baseline unavailable
Base MCO 20.18 GB 0.62 it/s 12 s Baseline Baseline
Base SCO 2.55 GB 0.48 it/s 16 s Baseline Baseline
Base + TE CUDA 25.60 GB 1.28 it/s 6 s N/A (base OOM) N/A
Base + TE MCO 20.16 GB 0.83 it/s 9 s +33.9% -0.1%
Base + TE SCO 6.08 GB 0.48 it/s 16 s -0.5% +138.4%
Base + TR CUDA 24.51 GB 3.79 it/s 2 s N/A (base OOM) N/A
Base + TR MCO 17.39 GB 1.08 it/s 7 s +75.0% -13.8%
Base + TR SCO 4.35 GB 2.70 it/s 2 s +461.6% +70.6%
Base + TR + TE CUDA 14.00 GB 3.81 it/s 2 s N/A (base OOM) N/A
Base + TR + TE MCO 7.52 GB 1.88 it/s 4 s +204.6% -62.7%
Base + TR + TE SCO 7.69 GB 2.68 it/s 2 s +457.4% +201.6%

Text-to-image (1024x1024)

Setup Mode Peak VRAM Throughput Time to image Throughput change vs. base VRAM change vs. base
Base CUDA OOM - - Baseline unavailable Baseline unavailable
Base MCO 18.53 GB 0.83 it/s 9 s Baseline Baseline
Base SCO 2.55 GB 0.62 it/s 12 s Baseline Baseline
Base + TR + TE CUDA 15.21 GB 8.91 it/s <1 s N/A (base OOM) N/A
Base + TR + TE MCO 6.42 GB 2.60 it/s 3 s +214.5% -65.4%
Base + TR + TE SCO 7.72 GB 3.00 it/s 2 s +383.0% +202.7%

In practice, Base + TR + TE is the best-balanced option under MCO: for image editing, peak VRAM drops from 20.18 GB to 7.52 GB while throughput rises to about 3.05x the base model; for text-to-image, VRAM drops from 18.53 GB to 6.42 GB while throughput rises to about 3.15x. Under SCO, quantization still improves speed substantially, but peak VRAM does not necessarily keep decreasing because the base SCO path already minimizes VRAM aggressively.

Inference stack

  • Engine: Nunchaku (low-bit FP4/INT4 inference and SVDQuant weights)
  • Framework: Diffusers with Flux2KleinPipeline support (official examples install from source):
pip install "git+https://github.com/huggingface/diffusers.git"

Install Nunchaku per the official docs: Installation (pick the wheel for your Python / PyTorch / CUDA).

FLUX.2 Klein and Nunchaku merge status (as of 2026-03-30)

Nunchaku support for this architecture is still in review; track PR nunchaku-ai/nunchaku#926. Published Nunchaku releases may not yet ship NunchakuFlux2Transformer2DModel.

To try locally before the PR lands: copy sources from this repo (or the PR branch):

  1. Place torch_transfer_utils.py under nunchaku/ inside your installed package;
  2. Place transformer_flux2.py under nunchaku/models/transformers/.

Import via the submodule path below (after merge, from nunchaku import NunchakuFlux2Transformer2DModel should usually work).

ComfyUI: build from the PR branch (or copy modules into the right paths) before use.

Minimal example (image-to-image + quantized Transformer)

Assumes the files above are installed and weights are available locally or at tonera/FLUX.2-klein-9B-Nunchaku; set REPO to your directory or Hugging Face model id.

import torch
from diffusers import Flux2KleinPipeline
from diffusers.utils import load_image

from nunchaku.models.transformers.transformer_flux2 import NunchakuFlux2Transformer2DModel
from nunchaku.utils import get_precision

REPO = "tonera/FLUX.2-klein-9B-Nunchaku"  # or local absolute path
NAME = "FLUX.2-klein-9B-Nunchaku"

transformer = NunchakuFlux2Transformer2DModel.from_pretrained(
    f"{REPO}/svdq-{get_precision()}_r32-{NAME}.safetensors",
    torch_dtype=torch.bfloat16,
)
pipe = Flux2KleinPipeline.from_pretrained(
    REPO, torch_dtype=torch.bfloat16, transformer=transformer
)

pipe.to("cuda")
# transformer.set_offload(
#     True, use_pin_memory=False, num_blocks_on_gpu=1
# )
# pipeline._exclude_from_cpu_offload.append("transformer")
# pipeline.enable_sequential_cpu_offload()

ref = load_image("https://example.com/your_ref.png").convert("RGB")
image = pipe(
    prompt="Describe your edit in English…",
    image=ref,
    guidance_scale=1.0,  # matches official Klein examples; tune if needed
    num_inference_steps=4,  # common for the distilled model; see Diffusers docs otherwise
    generator=torch.Generator("cpu").manual_seed(1),
).images[0]
image.save("flux2_klein_nunchaku.png")

For text-to-image, omit image (behavior per current Diffusers Flux2KleinPipeline docs). Use pipe.enable_model_cpu_offload() or similar if VRAM is tight.

License and compliance

These quantized weights are derived from FLUX.2-klein-9B. Use is subject to the FLUX Non-Commercial License and Black Forest Labs’ acceptable use policy; confirm commercial rights separately.

The model card YAML uses license: other because Hugging Face’s allowed license keys do not include the FLUX Non-Commercial License; binding terms are those linked above, not the generic other label alone.

Downloads last month
1,939
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonera/FLUX.2-klein-9B-Nunchaku

Quantized
(18)
this model

Collection including tonera/FLUX.2-klein-9B-Nunchaku