Nova-2: Multimodal Mamba Γ— Transformer Hybrid

210M params

Smilyai-labs notes: Finally! The preview is out!! Nova-2 is a hybrid language model combining Mamba-2 SSM blocks with Grouped-Query Attention Transformers, enhanced with Mixture-of-Experts FFN, vision, and audio adapters.

Architecture Highlights

Component Detail
Parameters 239,283,200
Hidden size 768
Attention heads 12 (KV: 4)
Mamba layers 12
Transformer layers 4
MoE experts 4 (top-2)
Context length 2048
Sliding window 512
Precision bfloat16

Key Features

  • Grouped-Query Attention β€” 4 KV heads shared across 12 query heads
  • Sliding window + global attention β€” 25% of heads attend to full context
  • Mamba-2 SSM β€” selective state spaces with gated input/output
  • Mixture-of-Experts SwiGLU FFN with load-balanced routing
  • Vision adapter β€” patch embedding β†’ mini-ViT β†’ learned projection
  • Audio adapter β€” mel-spectrogram β†’ conv β†’ mini-transformer β†’ projection
  • Weight tying between token embeddings and LM head
  • NTK-aware RoPE for long-context extrapolation
  • EMA weights for stable inference

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Smilyai-labs/Nova-2-preview", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Smilyai-labs/Nova-2-preview", trust_remote_code=True)

inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.8, top_p=0.9)
print(tokenizer.decode(outputs[0]))

Special Tokens

Token ID Purpose
<|endoftext|> 50256 BOS / EOS
<|image|> 50257 Vision token marker
<|/image|> 50258 Vision end marker
<|audio|> 50259 Audio token marker
<|/audio|> 50260 Audio end marker
<|pad|> 50261 Padding

Training

Trained with JAX/Flax on TPU using:

  • AdamW with cosine warmup-decay schedule
  • Gradient clipping (max norm 1.0)
  • Z-loss for logit regularization
  • MoE load-balancing auxiliary loss
  • Activation checkpointing (remat)
  • Mixed-precision (bfloat16)

License

Apache 2.0

Downloads last month
227
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support