New Models
Collection
Quants created recently.. where time is relative • 123 items • Updated
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.410,0.540,0.843,0.560,0.374,0.715,0.577
q8-hi 0.410,0.542,0.818,0.563,0.378,0.718,0.582
q8 0.411,0.539,0.819,0.563,0.378,0.718,0.577
q6-hi 0.404,0.542,0.821,0.560,0.372,0.715,0.575
q6 0.411,0.540,0.818,0.562,0.378,0.717,0.579
q5-hi 0.409,0.534,0.817,0.558,0.378,0.717,0.582
q5 0.410,0.553,0.806,0.560,0.376,0.717,0.579
q4-hi 0.401,0.519,0.820,0.552,0.362,0.717,0.569
q4 0.387,0.506,0.788,0.556,0.362,0.719,0.571
mxfp4 0.395,0.511,0.826,0.543,0.364,0.711,0.549
Quant Perplexity Peak memory
mxfp8 5.558 ± 0.039 7.65 GB
mxfp4 6.073 ± 0.044 6.71 GB
Qwen3.5-2B-Text
q5-hi 0.409,0.538,0.817,0.559,0.376,0.720,0.586
q5 0.411,0.550,0.809,0.560,0.372,0.716,0.586
q4-hi 0.399,0.521,0.819,0.551,0.362,0.715,0.572
q4 0.386,0.506,0.788,0.556,0.362,0.718,0.576
q3-hi 0.355,0.494,0.769,0.494,0.348,0.692,0.566
q3 0.335,0.479,0.720,0.462,0.322,0.670,0.551
Qwen3.5-2B-Text-heretic
mxfp8 0.412,0.547,0.832,0.560,0.382,0.713,0.582
mxfp4 0.403,0.508,0.808,0.542,0.354,0.711,0.563
arc arc/e boolq hswag obkqa piqa wino
tvall43/Qwen3.5-2B-Text-heretic
mxfp8 0.412,0.547,0.832,0.560,0.382,0.713,0.582
mxfp4 0.403,0.508,0.808,0.542,0.354,0.711,0.563
Qwen3.5-2B-Polaris-HighIQ-Thinking-x3
mxfp8 0.473,0.671,0.847,0.557,0.404,0.721,0.602
mxfp4 0.441,0.639,0.835,0.548,0.374,0.726,0.589
Perplexity
mxfp8 5.841 ± 0.043
mxfp4 6.322 ± 0.047
Qwen3.5-2B-Polaris-HighIQ-Thinking-x4
mxfp8 0.478,0.688,0.842,0.553,0.402,0.722,0.600
mxfp4 0.430,0.621,0.826,0.544,0.378,0.723,0.585
Perplexity
mxfp8 6.049 ± 0.046
mxfp4 6.457 ± 0.050
Qwen3.5-2B-GPT-5.1-HighIQ-Compact-Thinking-x4
mxfp8 0.427,0.579,0.820,0.554,0.396,0.720,0.623
Perplexity
mxfp8 5.837 ± 0.042
mxfp4 6.282 ± 0.046
More metrics coming soon
-G
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3.5-2B-mxfp8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
8-bit