google/smol
Viewer • Updated • 842k • 3.67k • 110
How to use alakxender/mt5-dhivehi-word-parallel with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
model = AutoModelForSeq2SeqLM.from_pretrained("alakxender/mt5-dhivehi-word-parallel")This model is a fine-tuned version of google/mt5-small on the Google smol gatitos__en_dv dataset.
⚠️ This is not a general-purpose translator. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.
google/mt5-smallgoogle/smol → gatitos__en_dv| Parameter | Value |
|---|---|
| Epochs | 90 |
| Batch size | 4 |
| Learning rate | 5e-5 (constant) |
| Final train loss | 0.3797 |
| Gradient norm (last) | 15.72 |
| Total steps | 89,460 |
| Samples/sec | ~14.24 |
| FLOPs | 2.36e+16 |
from transformers import MT5ForConditionalGeneration, T5Tokenizer
model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
This model is meant for:
Base model
google/mt5-base