Instructions to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1",
	filename="Luau-Qwen3-4B-FIM-v0.1-BF16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL

Use Docker

docker model run hf.co/TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL

LM Studio
Jan
Ollama
How to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with Ollama:
```
ollama run hf.co/TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL
```

Unsloth Studio new

How to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 to start chatting

Docker Model Runner
How to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with Docker Model Runner:
```
docker model run hf.co/TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL
```

Lemonade

How to use TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TorpedoSoftware/Luau-Qwen3-4B-FIM-v0.1:Q4_K_XL

Run and chat with the model

lemonade run user.Luau-Qwen3-4B-FIM-v0.1-Q4_K_XL

List all available models

lemonade list

Luau Qwen3 4B FIM v0.1

A specialized fine tune of Qwen/Qwen3-4B-Instruct-2507 trained specifically for FIM Luau code based on "Efficient Training of Language Models to Fill in the Middle" by Mohammad Bavarian et al., 2022. Instead of being a chatbot, it performs Luau autocomplete.

Expected format

<|repo_name|> and <|file_sep|> are technically optional, but you will get better responses when they are included.

If using a chat API:

[
   { "role": "system", "content": "You are a code completion assistant." },
   { "role": "user", "content": f"<|repo_name|>{reponame}<|file_sep|>{filename}<|fim_suffix|>{suffix}<|fim_prefix|>{prefix}<|fim_middle|>" }
]

If using a completions API, you'll need to essentially bake the chat template in:

prompt = f"<|im_start|>system\nYou are a code completion assistant.<|im_end|>\n<|im_start|>user\n<|repo_name|>{reponame}<|file_sep|>{filename}<|fim_suffix|>{suffix}<|fim_prefix|>{prefix}<|fim_middle|><|im_end|>\n<|im_start|>assistant\n"

Here is an example config.yaml for using this with Continue.dev for autocomplete in VSCode backed by LM Studio:

name: Local Autocomplete
version: 1.0.0
schema: v1
models:
  - name: Luau Qwen3 4B FIM v0.1
    provider: lmstudio
    apiBase: http://localhost:1234/v1
    model: luau-qwen3-4b-fim-v0.1
    roles:
      - autocomplete
    defaultCompletionOptions:
      stop: [
        "<|im_end|>",
        "</s>",
        "<|repo_name|>",
        "<|file_sep|>",
        "```"
      ]
    promptTemplates:
      autocomplete: "<|im_start|>system\nYou are a code completion assistant.<|im_end|>\n<|im_start|>user\n<|repo_name|>{{{reponame}}}<|file_sep|>{{{filename}}}<|fim_suffix|>{{{suffix}}}<|fim_prefix|>{{{prefix}}}<|fim_middle|><|im_end|>\n<|im_start|>assistant\n"

Model Information

Developer: Zack Williams (boatbomber)
Sponsor: Torpedo Software LLC
Base Model: Qwen/Qwen3-4B-Instruct-2507
Training Method: SFT (Supervised Finetuning)

Training Methodology

Dataset

Source: TorpedoSoftware/the-luau-stack

500,000 FIM-formatted Luau code snippets
- Completion to end of line, end of block, next few lines, etc
- Varied between Suffix Prefix Middle order and Prefix Suffix Middle order