MiniLLM 360m

This SLM (or Small Language Model) has been inspired by Karpathy's Video on GPT2 but with a little difference. The model has been made to be more production ready and more similar to trending models such as Alibaba's Qwen 3. So everything has taken from Karpathy's content, Qwen's attention and embedding mechanisms added to it and now, it is one of the pretrained models which are fully open sourced.

This project has been started by Muhammadreza Haghiri(and active on X with the handle @haghiri_ai) who's the founder of Mann-E which was the first generative AI platform with pretrained/fine-tuned models in the country of Iran. This model is an effort from Mann-E in order to have a more accessible and democratized AI for everyone.

How to run the model, contribute, etc

In order to run the model, contribute to the development of the model and find out more about the pretraining process, take a look at model's github page. All scripts and prerequisites are provided in the github repository.

Support The Project

You can support this project by donations. Donations are currently accepted in form of crypto and these are wallets:

  • Solana: GNJWgRmgRd7S9VrhCcJnVNTtAiQGTWrya9gcGb985x2m
  • Ethereum: 0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8
  • Sui: 0x943c1190bae9a052879c1861833621e20545bc33a8c990d48cc3bb8e7b1ac00b
  • Polygon: 0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8
  • Base: 0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8
  • Bitcoin (Taproot): bc1pgtgd3uymvdxycelu06zz3sgrt47rccw2zk9u550e4de6tzgngz2s738gsn
  • Bitcoin (Native Segwit): bc1q85drn275ugetvleha6egp7a8u0ramyf39zg4wj
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train minillm-society/minillm-360m