Instructions to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("NeuraLakeAi/iSA-02-1B-NoTags-Preview", dtype="auto")

llama-cpp-python

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NeuraLakeAi/iSA-02-1B-NoTags-Preview",
	filename="iSA-02-Nano-1B-NoTags.F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Use Docker

docker model run hf.co/NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

LM Studio
Jan
Ollama
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Ollama:
```
ollama run hf.co/NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
```

Unsloth Studio

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NeuraLakeAi/iSA-02-1B-NoTags-Preview to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NeuraLakeAi/iSA-02-1B-NoTags-Preview to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NeuraLakeAi/iSA-02-1B-NoTags-Preview to start chatting

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Docker Model Runner:
```
docker model run hf.co/NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
```

Lemonade

How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M

Run and chat with the model

lemonade run user.iSA-02-1B-NoTags-Preview-Q4_K_M

List all available models

lemonade list

⚠️ Experimental Release Notice:
This model is in an experimental phase on Hugging Face and is still undergoing training. Expect further enhancements and updates in the coming week.

NeuraLake iSA-02 Series: Advanced Small-Scale Reasoning Models

Overview

The NeuraLake iSA-02 Series comprises compact reasoning models optimized for efficient logical processing in resource-constrained environments. Designed for applications requiring nuanced decision-making and complex problem-solving, these models balance performance with computational efficiency.

Release Information

Model weights for each variant (1B, 2B, 3B, and 7B parameters) will be released post comprehensive training and optimization to ensure high performance and safety standards.

iSA-02-Nano-1B-Preview v1.1 (No Structured Tags Variant)

The iSA-02-Nano-1B-Preview is the latest addition to the iSA-02 series, enhanced with synthetic data to prioritize “thinking before speaking.” This focus enhances its reasoning capabilities, making it ideal for applications requiring thoughtful and logical text generation within a compact framework.

What is a Reasoning Model?

A reasoning model simulates human-like logical thinking, enabling the analysis of information, inference drawing, and decision-making based on data. Unlike traditional language models that generate text from patterns, reasoning models excel in understanding, planning, and executing multi-step processes.

Name and Inspiration

iSA: Stands for Intelligent, Small, Autonomous, reflecting the mission to create compact AI systems with adaptive and intelligent behavior.
Development: Initiated in January 2024, the series emerged from experiments combining diverse datasets, revealing initial reasoning capabilities in the base model. Unlike models derived from OpenAI, iSA-02 emphasizes unique reasoning enhancements through innovative synthetic data and contextual refinement.

Lineage

Based on meta-llama/Llama-3.2-1B-Instruct and refined with synthetic datasets from NeuraLake, the iSA-02-Nano-1B-Preview targets improvements in reasoning, long-context handling, and adaptive behaviors.

Key Features

Extended Context Window: Supports up to 256K tokens for complex reasoning and Retrieval-Augmented Generation (RAG).
Adaptive Reasoning: Adjusts reasoning depth based on context size—concise for <8K tokens and detailed for >16K tokens.
Efficiency Optimized: Balances advanced reasoning with low computational demands, suitable for resource-limited settings.

Model Specifications

Architecture

Type: Transformer-based
Layers: 16
Hidden Size: 2048
Attention Heads: 32
Feed-Forward Size: 8192
Vocabulary Size: 128,256

Training Parameters

Precision: Mixed Precision (fp16)
Context Window:
- Text Generation: 1,024–4,096 tokens
- Logical Reasoning: 16,000–64,000 tokens

Quantization Versions

Version	Format	Bits	Parameters	Download
F32	Custom Llama 3.2	FP32	1.24B	Download
F16	Custom Llama 3.2	FP16	1.24B	Download
Q4_0	Custom Llama 3.2	4-bit	1.24B	Download
Q4_K_M	Custom Llama 3.2	4-bit	1.24B	Download
Q5_K_M	Custom Llama 3.2	5-bit	1.24B	Download
Q8_0	Custom Llama 3.2	8-bit	1.24B	Download

Hardware Requirements

Version	Quantization	Size	Memory (RAM/vRAM)
F32	FP32	4.95 GB	9.9 GB
F16	FP16	2.48 GB	4.96 GB
Q4_0	4-bit	771 MB	1.56 GB
Q4_K_M	4-bit	808 MB	1.62 GB
Q5_K_M	5-bit	893 MB	1.84 GB
Q8_0	8-bit	1.32 GB	2.64 GB

Training and Fine-Tuning

Trained on synthetic datasets tailored to enhance logical reasoning, multi-step task execution, and contextual tool usage, the iSA-02 series ensures robust performance in complex scenarios and adaptive behaviors.

Use Cases

Applications

Logical Reasoning & Decision-Making: Generate analytical reports from system logs.
Dynamic Tool Integration: Ideal for long-context RAG tasks like querying large databases.
Structured Content Generation: Perfect for correcting OCR outputs and filling in missing data.

Limitations

Unsuitable for:
- High-throughput text generation.
- Latency-sensitive applications.
Challenges:
- Potential biases from synthetic data.
- Redundant or verbose reasoning.

Improvements in Version 1.1

Enhanced Reasoning: Faster processing with reduced overthinking.
Better Tool Utilization: More effective use of external tools.
Improved Context Understanding: Aligns actions with user intentions.
Reduced Redundancy: More concise responses.
Less Task Aversion: Fewer refusals of routine tasks.
Optimized Context Management: Efficient handling of the 256K context window.

Best Practices

Configuration Recommendations

max_tokens:
- Simple Tasks: 1,024–4,096 tokens
- Complex Tasks: 8,000–16,000 tokens
temperature:
- Objective Responses: 0.1–0.3
- Creative Reasoning: 0.7–1.0
top_p:
- Focused Outputs: 0.85
- Precision Tasks: 0.1
stop_sequences:
- Use specific sequences like "Therefore, the answer is," to minimize redundancy.

Prompt Engineering

Simple Tasks:
- Example: "You are a helpful assistant."
Complex Tasks:
- Example: "Transform OCR outputs into valid JSON, return only the JSON data as output."
- Structured Reasoning: "Not apply in "No Structured Tags", as it is not necessary or supported."

Supervision and Monitoring

Clear Prompts: Ensure instructions are specific and unambiguous to reduce errors and redundancies.

Known Issues (Addressed in V1.1)

Task Management: Improved handling of complex tasks and function calls.
Unusual Behavior: Reduced instances of unsolicited online searches or autonomous interactions.
Conversational Redirection: Enhanced stability in maintaining topic focus.
Function Call Execution: Ensured simulated function calls are actionable.

Citation

@misc{isa02,
  author       = {NeuraLake},
  title        = {iSA-02: The First Small Reasoning Model with Context-Dynamic Behavior},
  year         = {2024},
  license      = {Apache 2.0},
  url          = {https://huggingface.co/NeuraLake/iSA-02},
}

Note: This model card is under development and will be updated with additional details, evaluation metrics, and the final model name.

Downloads last month: 185

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support