Update README.md
Browse files
README.md
CHANGED
|
@@ -20,26 +20,9 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
|
|
| 20 |
|
| 21 |
|
| 22 |
## 🔥 News
|
| 23 |
-
- [2025/05/02] We release our datasets in huggingface🤗.
|
| 24 |
|
| 25 |
-
## 🔍 Dataset
|
| 26 |
|
| 27 |
-
To facilitate GRPO training, we also randomly sample 1,000 videos from [PhysBench](https://huggingface.co/datasets/WeiChow/PhysBench-train) training data to first improve model' reasoning abilities in real-world videos, then train the model on part of our synthetic videos.
|
| 28 |
-
|
| 29 |
-
Our data spans the following categories:
|
| 30 |
-
|
| 31 |
-
<img src="./images/fig1.png" style="zoom:35%;" />
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
## Getting Started
|
| 35 |
-
|
| 36 |
-
```
|
| 37 |
-
# Download the dataset
|
| 38 |
-
pip install huggingface_hub
|
| 39 |
-
|
| 40 |
-
# Download data to your local dir
|
| 41 |
-
huggingface-cli download IntelligenceLab/VideoHallu --repo-type dataset --local-dir ./new_video_folders --local-dir-use-symlinks False
|
| 42 |
-
```
|
| 43 |
|
| 44 |
## 🏅 <a name='rb'></a>Reward Model
|
| 45 |
We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
## 🔥 News
|
| 23 |
+
- [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.
|
| 24 |
|
|
|
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## 🏅 <a name='rb'></a>Reward Model
|
| 28 |
We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
|