Instructions to use raihan-js/medllm-10m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use raihan-js/medllm-10m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="raihan-js/medllm-10m")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("raihan-js/medllm-10m") model = AutoModelForCausalLM.from_pretrained("raihan-js/medllm-10m") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use raihan-js/medllm-10m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "raihan-js/medllm-10m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raihan-js/medllm-10m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/raihan-js/medllm-10m
- SGLang
How to use raihan-js/medllm-10m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "raihan-js/medllm-10m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raihan-js/medllm-10m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "raihan-js/medllm-10m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raihan-js/medllm-10m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use raihan-js/medllm-10m with Docker Model Runner:
docker model run hf.co/raihan-js/medllm-10m
๐ฉบ MedLLM-10M
A lightweight GPT-2-style causal language model trained from scratch on medical literature. Designed for research and education in the medical domain.
โ ๏ธ Disclaimer. This model is for research and educational purposes only. It must not be used for medical diagnosis, treatment recommendations, or clinical decision-making.
TL;DR
| Parameters | ~27.7M (10M body, rest in embeddings) |
| Architecture | GPT-2 (GPT2LMHeadModel) |
| Training | From scratch โ no base model |
| Vocabulary | 5,000 (custom medical tokenizer) |
| Context length | 512 tokens |
| Domain | Medical / clinical / biomedical English |
| Hardware | NVIDIA RTX 3060 12GB |
| License | Apache 2.0 |
Why a tiny medical model?
Large medical LLMs (PubMedBERT, BioGPT, Med-PaLM) are expensive to run and require infrastructure most clinics, students, and researchers don't have. MedLLM-10M is a deliberately tiny, from-scratch GPT-2 trained on medical text โ small enough to run on a CPU, finetune on a laptop, or use as a teaching artifact in an NLP course.
It is not competitive with commercial medical AI for diagnostic accuracy. It is competitive at being a small, transparent, fully open baseline for the medical domain.
Architecture
Standard GPT-2 with small dimensions:
Architecture: GPT2LMHeadModel
Layers: 8
Hidden size (n_embd): 512
Attention heads: 8
Feed-forward (n_inner):2,048
Max position (n_ctx): 512
Activation: GELU
Vocab size: 5,000
Dropout: 0.1
Training
- Data sources:
- PubMed abstracts and research papers
- Medical journal articles
- Clinical practice guidelines
- Medical Q&A datasets
- Healthcare reference content (e.g., Mayo Clinic, WebMD)
- Tokenizer: custom 5,000-token vocabulary fit to the medical corpus
- Training framework: Hugging Face Transformers (PyTorch)
- Hardware: NVIDIA RTX 3060 12GB
- Hyperparameters:
| Epochs | 10 |
| Batch size | 4 |
| Gradient accumulation | 8 |
| Learning rate | 3e-4 |
| Optimizer | AdamW |
| Weight decay | 0.01 |
| Warmup steps | 200 |
| Mixed precision | FP16 |
| Sequence stride | 256 |
No pretrained weights. No transfer learning. The model is initialized fresh and trained end-to-end on medical text.
Usage
Loads with standard transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "raihan-js/medllm-10m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "The patient presents with chest pain and"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=120, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Intended use
- Research baseline for medical-domain pretraining from scratch
- Teaching artifact for NLP / healthcare AI courses
- Embedding/feature extraction for downstream medical text tasks (with fine-tuning)
- Reference for how to build domain-specific tokenizers and train a small LM from scratch
Out-of-scope / hard limitations
- โ Not for clinical use. No diagnosis, treatment, triage, dosing, or any patient-facing application.
- โ No safety / alignment training. No RLHF, no harmlessness training.
- โ Hallucinates. It will fabricate medical claims confidently. Treat all outputs as untrusted text.
- โ English only. Trained exclusively on English medical literature.
Related models
raihan-js/orch-fusionand the rest of the ORCH series โ sibling from-scratch SLMs, in the code-generation domain
Author
Akteruzzaman Raihan Sikder โ AI/ML engineer, CTO at ClarioScope AI (HIPAA-compliant healthcare practice growth platform). Portfolio ยท GitHub.
Citation
@misc{sikder2025medllm10m,
title = {MedLLM-10M: A Lightweight GPT-2-Style Language Model Trained From Scratch on Medical Literature},
author = {Sikder, Akteruzzaman Raihan},
year = {2025},
url = {https://huggingface.co/raihan-js/medllm-10m}
}
- Downloads last month
- 21