translategemma
Collection
5 items • Updated
Stateful Core ML export of google/translategemma-4b-it with KV-cache states for incremental decoding on Apple platforms.
StatefulTranslateGemma4BITFP16.mlpackageStatefulTranslateGemma4BITInt8PerChannel.mlpackageStatefulTranslateGemma4BITInt4PerChannel.mlpackageconvert_stateful_translategemma_coreml.pyNOTICEInputs:
inputIds: int32, shape (1, queryLength)fullAttentionMask: float16, shape (1, 1, queryLength, endStep)slidingAttentionMask: float16, shape (1, 1, queryLength, endStep)States:
keyCache: float16, shape (layers, 1, kvHeads, maxContext, headDim)valueCache: float16, same shape as keyCacheOutput:
logits: float16ct.target.iOS18)ct.StateType)fullAttentionMask and slidingAttentionMask inputsStatefulTranslateGemma4BITFP16.mlpackage: smallest stable FP16-focused artifactStatefulTranslateGemma4BITInt8PerChannel.mlpackage: working balanced-size quantized variantStatefulTranslateGemma4BITInt4PerChannel.mlpackage: smallest working quantized variant validated in short decode runsThis repository contains a converted derivative of Gemma model weights. Use is subject to Gemma license terms and policies.
Base model
google/translategemma-4b-it