ai4privacy-metal-kernels
Apple-Silicon-optimised Metal kernels for the AI4Privacy PII token-classification post-processing path.
Built with huggingface/kernel-builder โ distributed
via the Hub and loaded at runtime with kernels.
What's in the box
| Kernel | Replaces | Why it's faster on MPS |
|---|---|---|
fused_argmax_softmax(logits) |
logits.argmax(-1) + logits.softmax(-1).gather(...) |
One dispatch, one pass over logits, no intermediate softmax tensor allocated. |
Designed for the decode step in ai4privacy.core.model_runner where the token-classification
logits [B, T, C] need to become (label_id, confidence) per token before the BIO span decoder
runs.
Install (end user)
from kernels import get_kernel
k = get_kernel("ai4privacy/ai4privacy-metal-kernels")
labels, conf = k.fused_argmax_softmax(logits)
Building (this repo)
Requires an Apple-Silicon Mac (ARM64), macOS 26.0+, Xcode 26.x, the Metal Toolchain,
and Determinate Nix with sandbox = relaxed.
# One-time setup
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
xcodebuild -downloadComponent MetalToolchain
# Build
nix develop
build2cmake generate build.toml
cmake -B build -S .
cmake --build build
Publish:
huggingface-cli upload ai4privacy/ai4privacy-metal-kernels build/ .
Hardware notes
Built and tuned on Apple M5 (10-core, MacBook Pro). Threadgroup size is 256 โ safe for
the whole M-series family. The kernel is numerically stable (subtracts row max before exp)
and matches eager PyTorch to within 1e-4 relative tolerance.
- Downloads last month
- -