TajikNLPWorld/TajPersParallelLexicalCorpus
Viewer • Updated • 43.8k • 26 • 2
Author: Arabov, Mullosharaf Kurbonovich
Organisation: TajikNLPWorld
Fine-tuned from xlm-roberta-base for Tajik POS tagging using only tajik and persian fields (no examples).
Input format: "tajik: слово [SEP] persian: ترجمه" (empty Persian allowed).
| Metric | Value |
|---|---|
| Accuracy | 0.764 ± 0.001 |
| F1‑weighted | 0.749 ± 0.003 |
| F1‑macro | 0.245 ± 0.026 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained("TajikNLPWorld/xlm-roberta-tajik-pos")
tokenizer = AutoTokenizer.from_pretrained("TajikNLPWorld/xlm-roberta-tajik-pos")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
pipe("tajik: китоб [SEP] persian: کتاب")
@inproceedings{arabov2026xlmr,
title = {XLM-RoBERTa fine-tuned for Tajik POS tagging (no examples field used)},
author = {Arabov, Mullosharaf Kurbonovich and TajikNLPWorld},
booktitle = {To appear},
year = {2026},
url = {https://huggingface.co/TajikNLPWorld/xlm-roberta-tajik-pos}
}