Instructions to use justinj92/gpt-oss-nemo-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use justinj92/gpt-oss-nemo-20b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="justinj92/gpt-oss-nemo-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("justinj92/gpt-oss-nemo-20b")
model = AutoModelForCausalLM.from_pretrained("justinj92/gpt-oss-nemo-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use justinj92/gpt-oss-nemo-20b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "justinj92/gpt-oss-nemo-20b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "justinj92/gpt-oss-nemo-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/justinj92/gpt-oss-nemo-20b

SGLang

How to use justinj92/gpt-oss-nemo-20b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "justinj92/gpt-oss-nemo-20b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "justinj92/gpt-oss-nemo-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "justinj92/gpt-oss-nemo-20b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "justinj92/gpt-oss-nemo-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use justinj92/gpt-oss-nemo-20b with Docker Model Runner:
```
docker model run hf.co/justinj92/gpt-oss-nemo-20b
```

GPT-OSS-NEMO-20B: Multilingual Thinking Model

Model Description

GPT-OSS-NEMO-20B is a fine-tuned version of OpenAI's GPT-OSS-20B model, specifically enhanced for multilingual reasoning and thinking capabilities. This model has been trained using Supervised Fine-Tuning (SFT) on the HuggingFaceH4/Multilingual-Thinking dataset to improve its ability to reason in multiple languages while maintaining strong performance across diverse linguistic contexts.

Key Features

🌍 Multilingual Reasoning: Enhanced ability to think and reason in multiple languages
🧠 Chain-of-Thought: Improved reasoning capabilities with explicit thinking processes
💬 Conversational: Optimized for interactive dialogue and question-answering
🎯 Cross-lingual: Can reason in one language and respond in another
⚡ High Performance: Built on the robust 20B parameter GPT-OSS foundation

Training Details

Base Model

Model: openai/gpt-oss-20b
Parameters: 20 billion parameters
Architecture: GPT-OSS (Mixture of Experts)

Fine-tuning Configuration

Method: LoRA (Low-Rank Adaptation)
Rank (r): 8
Alpha: 16
Target Modules: All linear layers with specific focus on MoE expert layers
Target Parameters:
- Layer 7, 15, 23 MLP experts (gate_up_proj, down_proj)

Training Infrastructure

Hardware: 4x NVIDIA H100 GPUs
Cloud Platform: Microsoft Azure NC-series instances
Training Framework: TRL (Transformers Reinforcement Learning)
Optimization: AdamW with cosine learning rate scheduling

Training Hyperparameters

Learning Rate: 2e-4
Batch Size: 4 per device (16 total with 4 GPUs)
Gradient Accumulation: 4 steps
Epochs: 4
Max Sequence Length: 2048 tokens
Warmup Ratio: 3%
LR Scheduler: Cosine with minimum LR (10% of peak)
Gradient Checkpointing: Enabled

Dataset

Name: HuggingFaceH4/Multilingual-Thinking
Purpose: Multilingual reasoning and thinking enhancement
Languages: Multiple languages including English, Spanish, Arabic, French, German, Chinese, Japanese, Korean, Hindi, Russian
Training Split: Full training set

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "justinj92/gpt-oss-nemo-20b",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("justinj92/gpt-oss-nemo-20b")

# Example: Multilingual reasoning
messages = [
    {"role": "system", "content": "reasoning language: Arabic"},
    {"role": "user", "content": "¿Cuál es la capital de Australia?"}
]

inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
)

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.6,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage with Custom Reasoning Language

# Specify reasoning language in system prompt
reasoning_language = "French"  # Can be any supported language
system_prompt = f"reasoning language: {reasoning_language}"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
]

Model Capabilities

Multilingual Reasoning

The model can:

Think and reason in a specified language (via system prompt)
Process questions in one language and reason in another
Maintain coherent logic across language boundaries
Provide explanations with explicit reasoning steps

Language Support

Primary languages include:

English (en)
Spanish (es)
Arabic (ar)
French (fr)
German (de)
Chinese (zh)
Japanese (ja)
Korean (ko)
Hindi (hi)
Russian (ru)

Performance

The model demonstrates improved performance in:

Cross-lingual reasoning tasks
Multi-step problem solving
Contextual understanding across languages
Maintaining coherence in multilingual conversations

Limitations

Performance may vary across different languages
Complex reasoning in low-resource languages may be limited
Generated content should be verified for factual accuracy
May exhibit biases present in the training data

Technical Specifications

Model Size: ~20B parameters
Precision: BF16 (Brain Floating Point 16-bit)
Memory Requirements: ~40GB VRAM for inference
Recommended Hardware: NVIDIA A100/H100 or similar high-memory GPUs
Framework Compatibility: transformers, torch, accelerate

Citation

If you use this model in your research, please cite:

@misc{gpt-oss-nemo-20b,
  title={GPT-OSS-NEMO-20B: A Multilingual Thinking Model},
  author={justinj92},
  year={2025},
  howpublished={\url{https://huggingface.co/justinj92/gpt-oss-nemo-20b}},
  note={Fine-tuned from openai/gpt-oss-20b using HuggingFaceH4/Multilingual-Thinking}
}

Acknowledgments

Base Model: OpenAI GPT-OSS-20B team
Dataset: HuggingFace H4 team for the Multilingual-Thinking dataset
Infrastructure: Microsoft Azure for cloud computing resources
Framework: Hugging Face transformers and TRL libraries

License

This model is released under the Apache 2.0 license, following the base model's licensing terms.

Model trained on August 2025 using state-of-the-art multilingual reasoning techniques.

Downloads last month: 163

Safetensors

Model size

21B params

Tensor type

BF16

Model tree for justinj92/gpt-oss-nemo-20b

Base model

openai/gpt-oss-20b

Adapter

(227)

this model

Adapters

2 models

justinj92
/

gpt-oss-nemo-20b