Instructions to use ethicalabs/Kurtis-EON1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ethicalabs/Kurtis-EON1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ethicalabs/Kurtis-EON1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ethicalabs/Kurtis-EON1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ethicalabs/Kurtis-EON1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ethicalabs/Kurtis-EON1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ethicalabs/Kurtis-EON1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ethicalabs/Kurtis-EON1
- SGLang
How to use ethicalabs/Kurtis-EON1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ethicalabs/Kurtis-EON1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ethicalabs/Kurtis-EON1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ethicalabs/Kurtis-EON1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ethicalabs/Kurtis-EON1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ethicalabs/Kurtis-EON1 with Docker Model Runner:
docker model run hf.co/ethicalabs/Kurtis-EON1
Echo-DSRN - Triton Kernel Benchmark Report - PyTorch (native) vs Triton Legacy (sequential) vs Triton 3-Pass (new)
#10
by mrs83 - opened
Echo-DSRN Triton Kernel Benchmark Report
System Specifications
| Component | Value |
|---|---|
| OS | Linux 6.17.0-14-generic |
| CPU | AMD RYZEN AI MAX+ 395 w/ Radeon 8060S |
| Python | 3.13.12 |
| PyTorch | 2.10.0+rocm7.1 |
GPU Specifications
| Metric | Value |
|---|---|
| Name | AMD Radeon 8060S |
| Total Memory | 96.00 GB |
| Compute Capability | 11.5 |
| Multi Processors | 20 |
Disclaimer: The throughput metrics (TPS) in this report refer specifically to the isolated DSRN slow-state update kernels. These values represent raw kernel performance and do not reflect end-to-end model generation speeds.
Kernel Performance Results
Sequence Length T=128
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 0.55 | 20.28 | 931138 |
| Triton (Legacy) | 0.14 | 12.62 | 3655291 |
| Triton (3-Pass) | 0.23 | 8.51 | 2200967 |
Sequence Length T=512
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 0.91 | 81.04 | 2253689 |
| Triton (Legacy) | 0.40 | 50.36 | 5102426 |
| Triton (3-Pass) | 0.37 | 33.97 | 5503546 |
Sequence Length T=1024
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 2.04 | 162.04 | 2007393 |
| Triton (Legacy) | 0.91 | 100.70 | 4504895 |
| Triton (3-Pass) | 0.69 | 67.92 | 5906577 |
Sequence Length T=2048
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 4.41 | 324.04 | 1858820 |
| Triton (Legacy) | 2.02 | 201.36 | 4053194 |
| Triton (3-Pass) | 1.62 | 135.82 | 5048226 |
Sequence Length T=4096
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 8.55 | 648.05 | 1916386 |
| Triton (Legacy) | 3.98 | 402.69 | 4119082 |
| Triton (3-Pass) | 3.24 | 271.61 | 5060568 |
Sequence Length T=8192
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 16.76 | 1296.07 | 1955516 |
| Triton (Legacy) | 7.90 | 805.34 | 4150341 |
| Triton (3-Pass) | 6.08 | 543.19 | 5387080 |
System Specifications
| Component | Value |
|---|---|
| OS | Linux 6.17.0-19-generic |
| CPU | AMD Ryzen 7 7700 8-Core Processor |
| Python | 3.13.12 |
| PyTorch | 2.10.0+rocm7.1 |
GPU Specifications
| Metric | Value |
|---|---|
| Name | AMD Radeon AI PRO R9700 |
| Total Memory | 31.86 GB |
| Compute Capability | 12.0 |
| Multi Processors | 32 |
Disclaimer: The throughput metrics (TPS) in this report refer specifically to the isolated DSRN slow-state update kernels. These values represent raw kernel performance and do not reflect end-to-end model generation speeds.
Kernel Performance Results
Sequence Length T=128
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 0.73 | 20.28 | 696828 |
| Triton (Legacy) | 0.17 | 12.62 | 2940147 |
| Triton (3-Pass) | 0.32 | 8.51 | 1611378 |
Sequence Length T=512
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 1.03 | 81.04 | 1987169 |
| Triton (Legacy) | 0.31 | 50.36 | 6566224 |
| Triton (3-Pass) | 0.35 | 33.97 | 5907795 |
Sequence Length T=1024
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 1.41 | 162.04 | 2902789 |
| Triton (Legacy) | 0.60 | 100.70 | 6878000 |
| Triton (3-Pass) | 0.37 | 67.92 | 11095956 |
Sequence Length T=2048
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 3.18 | 324.04 | 2572318 |
| Triton (Legacy) | 1.26 | 201.36 | 6494856 |
| Triton (3-Pass) | 0.60 | 135.82 | 13661381 |
Sequence Length T=4096
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 4.75 | 648.05 | 3448440 |
| Triton (Legacy) | 2.73 | 402.69 | 5995208 |
| Triton (3-Pass) | 1.25 | 271.61 | 13058581 |
Sequence Length T=8192
| Implementation | Total Time (ms) | Peak Memory (MB) | Raw Kernel TPS |
|---|---|---|---|
| PyTorch (Legacy) | 9.09 | 1296.07 | 3606687 |
| Triton (Legacy) | 5.51 | 805.34 | 5943563 |
| Triton (3-Pass) | 2.46 | 543.19 | 13294154 |