MiniLLM 360m
This SLM (or Small Language Model) has been inspired by Karpathy's Video on GPT2 but with a little difference. The model has been made to be more production ready and more similar to trending models such as Alibaba's Qwen 3. So everything has taken from Karpathy's content, Qwen's attention and embedding mechanisms added to it and now, it is one of the pretrained models which are fully open sourced.
This project has been started by Muhammadreza Haghiri(and active on X with the handle @haghiri_ai) who's the founder of Mann-E which was the first generative AI platform with pretrained/fine-tuned models in the country of Iran. This model is an effort from Mann-E in order to have a more accessible and democratized AI for everyone.
How to run the model, contribute, etc
In order to run the model, contribute to the development of the model and find out more about the pretraining process, take a look at model's github page. All scripts and prerequisites are provided in the github repository.
Support The Project
You can support this project by donations. Donations are currently accepted in form of crypto and these are wallets:
- Solana:
GNJWgRmgRd7S9VrhCcJnVNTtAiQGTWrya9gcGb985x2m - Ethereum:
0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8 - Sui:
0x943c1190bae9a052879c1861833621e20545bc33a8c990d48cc3bb8e7b1ac00b - Polygon:
0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8 - Base:
0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8 - Bitcoin (Taproot):
bc1pgtgd3uymvdxycelu06zz3sgrt47rccw2zk9u550e4de6tzgngz2s738gsn - Bitcoin (Native Segwit):
bc1q85drn275ugetvleha6egp7a8u0ramyf39zg4wj