Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
AI & ML interests
At the University of Helsinki, we focus on: - NLP for morphologically-rich languages - Cross-lingual NLP - NLP in the humanities
Recent Activity
View all activity
Organization Card
Helsinki-NLP refers to the language technology research group at the University of Helsinki. Here, we publish various resource related to multilingual NLP, machine translation, text simplification to name a few application areas. We focus on wide language coverage, open data sets and public pre-trained models.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 13 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 40 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 14 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 24
Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 13 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 40 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 14 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 24
models
1,536
Helsinki-NLP/opus-mt-eo-caenes
Translation
•
76.9M
•
Updated
•
61
Helsinki-NLP/opus-mt-caenes-eo
Translation
•
76.9M
•
Updated
•
55
Helsinki-NLP/opus-mt-fr-en
Translation
•
75.2M
•
Updated
•
1.23M
•
•
49
Helsinki-NLP/opus-mt-synthetic-en-eu
Updated
•
33
•
1
Helsinki-NLP/opus-mt-synthetic-en-mk
Updated
•
32
Helsinki-NLP/opus-mt-synthetic-en-ka
Updated
•
43
Helsinki-NLP/opus-mt-synthetic-en-so
Updated
•
113
•
1
Helsinki-NLP/opus-mt-synthetic-en-is
Updated
•
38
•
1
Helsinki-NLP/opus-mt-synthetic-en-uk
Updated
•
16
Helsinki-NLP/opus-mt-synthetic-en-gd
Updated
•
24
datasets
51
Helsinki-NLP/nemotron-cc-translated
Viewer
•
Updated
•
5.79B
•
14.3k
•
2
Helsinki-NLP/fineweb-edu-translated
Preview
•
Updated
•
43.6k
•
4
Helsinki-NLP/OpenSubtitles2024
Viewer
•
Updated
•
570M
•
1.87k
•
2
Helsinki-NLP/shroom
Preview
•
Updated
•
22
Helsinki-NLP/mu-shroom
Viewer
•
Updated
•
11.5k
•
214
•
4
Helsinki-NLP/tatoeba_mt_train
Viewer
•
Updated
•
13.7B
•
2.38k
•
3
Helsinki-NLP/tatoeba_mt
Updated
•
27.8k
•
61
Helsinki-NLP/un_pc
Viewer
•
Updated
•
323M
•
4.87k
•
23
Helsinki-NLP/un_ga
Viewer
•
Updated
•
1.11M
•
3.29k
•
3
Helsinki-NLP/opus_books
Viewer
•
Updated
•
1.25M
•
16.6k
•
85