BYOL Chichewa 4B IT

This model was produced by the BYOL framework for extending LLMs to low-resource languages.

Base model: google/gemma-3-4b-pt
Language: Chichewa (nya)
Training stage: Instruction Tuning (SFT)
License: Gemma Terms of Use (derived from Gemma 3)
Paper: BYOL: Bring Your Own Language Into LLMs
Code: github.com/microsoft/byol

Model Description

This is an instruction-tuned (SFT) language model for Chichewa (nya). It was created by applying supervised fine-tuning on top of the BYOL Chichewa 4b CPT checkpoint, using translated instruction-following data (SmolTalk2 + AYA) generated via the BYOL framework.

This is an intermediate checkpoint used to produce the merged model. For best results, use the merged variant instead, which combines the language knowledge from CPT with the instruction-following ability from this model.

Usage

pip install -U transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ai-for-good-lab/byol-nya-4b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", dtype=torch.bfloat16)

# Chat inference
messages = [{"role": "user", "content": "Tandiuzeni za dziko la Malawi."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@article{zamir2026byolbringlanguagellms,
    title={BYOL: Bring Your Own Language Into LLMs},
    author={Syed Waqas Zamir and Wassim Hamidouche and Boulbaba Ben Amor and Luana Marotti and Inbal Becker-Reshef and Juan Lavista Ferres},
    year={2026},
    journal={arXiv:2601.10804},
    url={https://arxiv.org/abs/2601.10804},
}