nielsr HF Staff commited on
Commit
93fb84b
·
verified ·
1 Parent(s): bb5de06

Improve model card: Add library, update pipeline tag, link to code

Browse files

This PR improves the model card by:

- Updating the `pipeline_tag` to `text-generation` to accurately reflect the model's capabilities in reasoning and code generation.
- Adding `library_name: transformers` to indicate compatibility with the Hugging Face Transformers library.
- Adding a direct link to the GitHub repository for easier access to the code.

Files changed (1) hide show
  1. README.md +40 -6
README.md CHANGED
@@ -1,26 +1,60 @@
1
  ---
2
  base_model:
3
  - allenai/OLMo-2-1124-7B-SFT
4
- license: apache-2.0
5
  datasets:
6
  - math
 
 
 
7
  metrics:
8
  - accuracy
9
  pipeline_tag: text-generation
10
- language:
11
- - en
12
  ---
13
 
14
  # OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH
15
 
16
- **Description:**
17
 
18
- An Intuitor-fine-tuned version of Allenai/OLMo-2-1124-7B-SFT trained on the MATH dataset.
19
 
20
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Citation
23
 
 
 
24
  ```bibtex
25
  @article{zhao2025learning,
26
  title={Learning to Reason without External Rewards},
 
1
  ---
2
  base_model:
3
  - allenai/OLMo-2-1124-7B-SFT
 
4
  datasets:
5
  - math
6
+ language:
7
+ - en
8
+ license: apache-2.0
9
  metrics:
10
  - accuracy
11
  pipeline_tag: text-generation
12
+ library_name: transformers
 
13
  ---
14
 
15
  # OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH
16
 
17
+ This repository contains the `OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH` model, an Intuitor-fine-tuned version of `Allenai/OLMo-2-1124-7B-SFT` trained on the MATH dataset, as presented in the paper [Learning to Reason without External Rewards](https://huggingface.co/papers/2505.19590).
18
 
19
+ **Intuitor** is a reinforcement learning method that fine-tunes Large Language Models (LLMs) using *self-certainty*—the model’s own internal confidence—as the sole reward. It is built on a novel paradigm called **Reinforcement Learning from Internal Feedback (RLIF)**, enabling models to learn without any external rewards, gold labels, or verifiers.
20
 
21
+ ## Usage
22
+
23
+ You can load this model using the Hugging Face `transformers` library. For detailed instructions on how to use, train, and evaluate the model, please refer to the official GitHub repository:
24
+
25
+ [**GitHub Repository: sunblaze-ucb/Intuitor**](https://github.com/sunblaze-ucb/Intuitor)
26
+
27
+ ```python
28
+ from transformers import AutoModelForCausalLM, AutoTokenizer
29
+ import torch
30
+
31
+ # Load the model and tokenizer
32
+ model_name = "sunblaze-ucb/OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH"
33
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
34
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
35
+
36
+ # Example for text generation
37
+ prompt = "Question: What is 2 + 2?
38
+ Answer:"
39
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
40
+
41
+ outputs = model.generate(
42
+ **inputs,
43
+ max_new_tokens=50,
44
+ do_sample=True,
45
+ temperature=0.7,
46
+ top_p=0.9,
47
+ eos_token_id=tokenizer.eos_token_id
48
+ )
49
+
50
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
51
+ print(generated_text)
52
+ ```
53
 
54
  ## Citation
55
 
56
+ If you find our work helpful or inspiring, please feel free to cite it:
57
+
58
  ```bibtex
59
  @article{zhao2025learning,
60
  title={Learning to Reason without External Rewards},