ai4privacy-metal-kernels

Apple-Silicon-optimised Metal kernels for the AI4Privacy PII token-classification post-processing path. Built with huggingface/kernel-builder — distributed via the Hub and loaded at runtime with kernels.

What's in the box

Kernel	Replaces	Why it's faster on MPS
`fused_argmax_softmax(logits)`	`logits.argmax(-1)` + `logits.softmax(-1).gather(...)`	One dispatch, one pass over `logits`, no intermediate `softmax` tensor allocated.

Designed for the decode step in ai4privacy.core.model_runner where the token-classification logits [B, T, C] need to become (label_id, confidence) per token before the BIO span decoder runs.

Install (end user)

from kernels import get_kernel
k = get_kernel("ai4privacy/ai4privacy-metal-kernels")
labels, conf = k.fused_argmax_softmax(logits)

Building (this repo)

Requires an Apple-Silicon Mac (ARM64), macOS 26.0+, Xcode 26.x, the Metal Toolchain, and Determinate Nix with sandbox = relaxed.

# One-time setup
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
xcodebuild -downloadComponent MetalToolchain

# Build
nix develop
build2cmake generate build.toml
cmake -B build -S .
cmake --build build

Publish:

huggingface-cli upload ai4privacy/ai4privacy-metal-kernels build/ .

Hardware notes

Built and tuned on Apple M5 (10-core, MacBook Pro). Threadgroup size is 256 — safe for the whole M-series family. The kernel is numerically stable (subtracts row max before exp) and matches eager PyTorch to within 1e-4 relative tolerance.

Downloads last month: -