AI & ML interests

Building the future of AI through collaborative research, shared datasets, and open-source models. | Managed by @Cossale

Recent Activity

Cossale  updated a model about 1 month ago
keplersystems/VoxCPM-PAARI-7lang
Cossale  updated a dataset about 2 months ago
keplersystems/PAARI-English-TTS
Cossale  published a model about 2 months ago
keplersystems/VoxCPM-PAARI-7lang
View all activity

Cossale 
published a model about 2 months ago
Cossale 
posted an update 6 months ago
view post
Post
340
Releasing 8 multilingual datasets from the People's Archive of Rural India (PAARI).
Indian languages represent 1B+ speakers but remain underrepresented in quality training data. These datasets help address that gap.
Languages: Hindi, Urdu, Punjabi, Tamil, Telugu, Marathi, Gujarati, English
Scripts: Devanagari, Arabic, Gurmukhi, Tamil, Telugu, Gujarati
Total: 7,650 articles, 19.9M tokens, 51MB
Content covers rural life, agriculture, social issues, and cultural traditions. Professionally written journalism, not web scrapes.
Free to use.
Collection: https://huggingface.co/collections/keplersystems/paari-datasets
Technical details: https://kepler.systems/blog/introducing-paari-datasets
Cossale 
updated a Space 10 months ago