Instructions to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder")
model = AutoModelForCausalLM.from_pretrained("ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder

SGLang

How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder with Docker Model Runner:
```
docker model run hf.co/ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder
```

QA Builder - 32k sequence length

Welcome to the training progress of my new language model with a 32k configuration. Training a model is a meticulous process, and it is in this configuration that the size of the sequence stands out in a crucial way, allowing greater ability to understand and generate text.

Why 32k?

Sequence size is critical. A longer sequence allows the model to understand more extensive context, capturing nuances and details that might otherwise be missed. Additionally, with the expansion to 32k, we have more space to incorporate sophisticated elements such as rhetoric, allowing for more persuasive and eloquent expression.

Here are the painstaking steps through which the model is being trained:

Wikipedia Titles+Introduction: A solid foundation is built using Wikipedia titles, providing an overview of various topics.
Titles + Wikipedia Content: Deepening understanding, the full content of Wikipedia articles is incorporated.
Classic Books: An immersion in the nuances of historical literary language, training the model with texts from classic books.
Articles: Incorporating detailed and updated information from articles from different fields of knowledge.
QA (Questions and Answers): Improving model responsiveness and understandability with a dataset of questions and answers.
Rhetoric: Rhetoric plays a vital role in refining the model's ability to understand and generate persuasive speeches. For this, he is exposed to materials rich in rhetorical elements.

I look forward to sharing the results of this fascinating project with you all!

Training Status

In my last tests with the sequence of length 2048, I achieved great models. With just a 4090 GPU, I trained models in 24 hours. I'll try to replicate the success with this 32k setup over the next few hours and post the result.

I am currently on step 2/6 of training. Each stage lasts 4 to 6 hours. I'm releasing the partial models, and at the end, I will also release the datasets, all 100% synthetic and formatted in Markdown.

Results so far: [Results shown show model-specific metrics] (if you have problems on eval, set same max_length)

Task	Version	Metric	Value		Stderr
winogrande	0	acc	0.5162	±	0.014

hf-causal (max_length=3200), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

Task	Version	Metric	Value		Stderr
openbookqa	0	acc	0.1380	±	0.0154
		acc_norm	0.3420	±	0.0212
piqa	0	acc	0.6289	±	0.0113
		acc_norm	0.6251	±	0.0113

hf-causal (max_length=1280), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	0.1903	±	0.0115
		acc_norm	0.2270	±	0.0122
hellaswag	0	acc	0.2892	±	0.0045
		acc_norm	0.3114	±	0.0046

PAUSED: I had some problems generating the context that was overlapping and the model diverged. I will update as soon as I resolve the issue.

Your contribution and feedback are always valuable. Follow along and share your thoughts as we move forward on this exciting journey!

Downloads last month: 12

ccore
/

Llama2-330m-32k-Rhetorical-Agents-QA-Builder

QA Builder - 32k sequence length

Why 32k?

Training Status

Dataset used to train ccore/Llama2-330m-32k-Rhetorical-Agents-QA-Builder