Instructions to use LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K") model = AutoModelForCausalLM.from_pretrained("LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K", max_seq_length=2048, )
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K to start chattingUsing HuggingFace Spaces for Unsloth
# No setup required# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K to start chattingLoad model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K",
max_seq_length=2048,
)Model Card for Qwen2.5-0.5B-Instruct-GSM8K-Reasoning
This model is a fine-tuned version of the Qwen2.5-0.5B-Instruct model, specifically adapted for mathematical reasoning tasks using the GSM8K dataset. It leverages GPRO (Generalized Policy Optimization for Reasoning) methods, as described in the DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models paper, to enhance its reasoning capabilities. The fine-tuning process was performed using Unsloth for efficiency and TRL (Transformer Reinforcement Learning) for reinforcement learning-based training.
Model Details
How to Get Started with the Model
Use the code below to load and use the model with vLLM & Unsloth:
from unsloth import FastLanguageModel
from vllm import SamplingParams
import torch
# Load the Model & Tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K",
max_seq_length = 2048,
load_in_4bit = True,
fast_inference = True,
gpu_memory_utilization = 0.7,
)
# Prep the Message
PROMPT = "How many r's are in the word strawberry?"
SYSTEM_PROMPT = """
A conversation between User and Assistant. The user asks a question,
and the Assistant solves it. The assistant first thinks about the
reasoning process in the mind and then provides the user with the answer.
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
text = tokenizer.apply_chat_template([
{"role" : "system", "content" : SYSTEM_PROMPT},
{"role" : "user", "content" : PROMPT},
], tokenize = False, add_generation_prompt = True)
# Generate a response
sampling_params = SamplingParams(
temperature = 0.8,
top_p = 0.95,
max_tokens = 1024,
)
output = model.fast_generate(
text,
sampling_params = sampling_params,
)[0].outputs[0].text
Model Description
- Model type: Transformer-based language model fine-tuned for mathematical reasoning.
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: Qwen/Qwen2.5-0.5B-Instruct
Uses
Direct Use
This model is intended for mathematical reasoning tasks, particularly for solving grade-school-level math problems as found in the GSM8K dataset. It can be used directly for question-answering tasks involving arithmetic and reasoning.
Downstream Use [optional]
The model can be fine-tuned further for specific applications, such as tutoring systems, automated problem-solving tools, or other educational technologies.
Out-of-Scope Use
This model is not designed for:
- High-level mathematical research or advanced problem-solving.
- Non-mathematical reasoning tasks without additional fine-tuning.
- Applications requiring high precision in domains outside its training data.
Bias, Risks, and Limitations
- Bias: The model may inherit biases present in the GSM8K dataset or the base model.
- Risks: Incorrect reasoning or answers in critical applications (e.g., education or finance) could lead to misinformation.
- Limitations: The model's performance is constrained by the quality and scope of the GSM8K dataset and the base model's capabilities.
Recommendations
Users should:
- Validate the model's outputs for critical applications.
- Fine-tune the model further for domain-specific tasks.
- Be aware of potential biases and limitations in reasoning capabilities.
Citations
Cite GRPO as:
@article{zhihong2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 23
Install Unsloth Studio (macOS, Linux, WSL)
# Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LahiruWije/Qwen2.5-0.5B-Instruct-GPRO-GSM8K to start chatting