Instructions to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder") model = AutoModelForCausalLM.from_pretrained("ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder
- SGLang
How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with Docker Model Runner:
docker model run hf.co/ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder
QA Builder - 32k sequence length
Welcome to the training progress of my new language model with a 32k configuration. Training a model is a meticulous process, and it is in this configuration that the size of the sequence stands out in a crucial way, allowing greater ability to understand and generate text.
Why 32k?
Sequence size is critical. A longer sequence allows the model to understand more extensive context, capturing nuances and details that might otherwise be missed. Additionally, with the expansion to 32k, we have more space to incorporate sophisticated elements such as rhetoric, allowing for more persuasive and eloquent expression.
Here are the painstaking steps through which the model is being trained:
- Wikipedia Titles+Introduction: A solid foundation is built using Wikipedia titles, providing an overview of various topics.
- Titles + Wikipedia Content: Deepening understanding, the full content of Wikipedia articles is incorporated.
- Classic Books: An immersion in the nuances of historical literary language, training the model with texts from classic books.
- Articles: Incorporating detailed and updated information from articles from different fields of knowledge.
- QA (Questions and Answers): Improving model responsiveness and understandability with a dataset of questions and answers.
- Rhetoric: Rhetoric plays a vital role in refining the model's ability to understand and generate persuasive speeches. For this, he is exposed to materials rich in rhetorical elements.
I look forward to sharing the results of this fascinating project with you all!
Training Status
In my last tests with the sequence of length 2048, I achieved great models. With just a 4090 GPU, I trained models in 24 hours. I'll try to replicate the success with this 32k setup over the next few hours and post the result.
I am currently on step 2/6 of training. Each stage lasts 4 to 6 hours. I'm releasing the partial models, and at the end, I will also release the datasets, all 100% synthetic and formatted in Markdown.
Results so far: [Results shown show model-specific metrics] (if you have problems on eval, set same max_length)
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| winogrande | 0 | acc | 0.5162 | ± | 0.014 |
hf-causal (max_length=3200), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| openbookqa | 0 | acc | 0.1380 | ± | 0.0154 |
| acc_norm | 0.3420 | ± | 0.0212 | ||
| piqa | 0 | acc | 0.6289 | ± | 0.0113 |
| acc_norm | 0.6251 | ± | 0.0113 |
hf-causal (max_length=1280), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| arc_challenge | 0 | acc | 0.1903 | ± | 0.0115 |
| acc_norm | 0.2270 | ± | 0.0122 | ||
| hellaswag | 0 | acc | 0.2892 | ± | 0.0045 |
| acc_norm | 0.3114 | ± | 0.0046 |
PAUSED: I had some problems generating the context that was overlapping and the model diverged. I will update as soon as I resolve the issue.
Your contribution and feedback are always valuable. Follow along and share your thoughts as we move forward on this exciting journey!
- Downloads last month
- 12