Instructions to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("NeuraLakeAi/iSA-02-1B-NoTags-Preview", dtype="auto") - llama-cpp-python
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="NeuraLakeAi/iSA-02-1B-NoTags-Preview", filename="iSA-02-Nano-1B-NoTags.F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M # Run inference directly in the terminal: llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M # Run inference directly in the terminal: llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Use Docker
docker model run hf.co/NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Ollama:
ollama run hf.co/NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
- Unsloth Studio
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NeuraLakeAi/iSA-02-1B-NoTags-Preview to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NeuraLakeAi/iSA-02-1B-NoTags-Preview to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for NeuraLakeAi/iSA-02-1B-NoTags-Preview to start chatting
- Pi
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Docker Model Runner:
docker model run hf.co/NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
- Lemonade
How to use NeuraLakeAi/iSA-02-1B-NoTags-Preview with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull NeuraLakeAi/iSA-02-1B-NoTags-Preview:Q4_K_M
Run and chat with the model
lemonade run user.iSA-02-1B-NoTags-Preview-Q4_K_M
List all available models
lemonade list
β οΈ Experimental Release Notice:
This model is in an experimental phase on Hugging Face and is still undergoing training. Expect further enhancements and updates in the coming week.
NeuraLake iSA-02 Series: Advanced Small-Scale Reasoning Models
Overview
The NeuraLake iSA-02 Series comprises compact reasoning models optimized for efficient logical processing in resource-constrained environments. Designed for applications requiring nuanced decision-making and complex problem-solving, these models balance performance with computational efficiency.
Release Information
Model weights for each variant (1B, 2B, 3B, and 7B parameters) will be released post comprehensive training and optimization to ensure high performance and safety standards.
iSA-02-Nano-1B-Preview v1.1 (No Structured Tags Variant)
The iSA-02-Nano-1B-Preview is the latest addition to the iSA-02 series, enhanced with synthetic data to prioritize βthinking before speaking.β This focus enhances its reasoning capabilities, making it ideal for applications requiring thoughtful and logical text generation within a compact framework.
What is a Reasoning Model?
A reasoning model simulates human-like logical thinking, enabling the analysis of information, inference drawing, and decision-making based on data. Unlike traditional language models that generate text from patterns, reasoning models excel in understanding, planning, and executing multi-step processes.
Name and Inspiration
- iSA: Stands for Intelligent, Small, Autonomous, reflecting the mission to create compact AI systems with adaptive and intelligent behavior.
- Development: Initiated in January 2024, the series emerged from experiments combining diverse datasets, revealing initial reasoning capabilities in the base model. Unlike models derived from OpenAI, iSA-02 emphasizes unique reasoning enhancements through innovative synthetic data and contextual refinement.
Lineage
Based on meta-llama/Llama-3.2-1B-Instruct and refined with synthetic datasets from NeuraLake, the iSA-02-Nano-1B-Preview targets improvements in reasoning, long-context handling, and adaptive behaviors.
Key Features
- Extended Context Window: Supports up to 256K tokens for complex reasoning and Retrieval-Augmented Generation (RAG).
- Adaptive Reasoning: Adjusts reasoning depth based on context sizeβconcise for <8K tokens and detailed for >16K tokens.
- Efficiency Optimized: Balances advanced reasoning with low computational demands, suitable for resource-limited settings.
Model Specifications
Architecture
- Type: Transformer-based
- Layers: 16
- Hidden Size: 2048
- Attention Heads: 32
- Feed-Forward Size: 8192
- Vocabulary Size: 128,256
Training Parameters
- Precision: Mixed Precision (fp16)
- Context Window:
- Text Generation: 1,024β4,096 tokens
- Logical Reasoning: 16,000β64,000 tokens
Quantization Versions
| Version | Format | Bits | Parameters | Download |
|---|---|---|---|---|
| F32 | Custom Llama 3.2 | FP32 | 1.24B | Download |
| F16 | Custom Llama 3.2 | FP16 | 1.24B | Download |
| Q4_0 | Custom Llama 3.2 | 4-bit | 1.24B | Download |
| Q4_K_M | Custom Llama 3.2 | 4-bit | 1.24B | Download |
| Q5_K_M | Custom Llama 3.2 | 5-bit | 1.24B | Download |
| Q8_0 | Custom Llama 3.2 | 8-bit | 1.24B | Download |
Hardware Requirements
| Version | Quantization | Size | Memory (RAM/vRAM) |
|---|---|---|---|
| F32 | FP32 | 4.95 GB | 9.9 GB |
| F16 | FP16 | 2.48 GB | 4.96 GB |
| Q4_0 | 4-bit | 771 MB | 1.56 GB |
| Q4_K_M | 4-bit | 808 MB | 1.62 GB |
| Q5_K_M | 5-bit | 893 MB | 1.84 GB |
| Q8_0 | 8-bit | 1.32 GB | 2.64 GB |
Training and Fine-Tuning
Trained on synthetic datasets tailored to enhance logical reasoning, multi-step task execution, and contextual tool usage, the iSA-02 series ensures robust performance in complex scenarios and adaptive behaviors.
Use Cases
Applications
- Logical Reasoning & Decision-Making: Generate analytical reports from system logs.
- Dynamic Tool Integration: Ideal for long-context RAG tasks like querying large databases.
- Structured Content Generation: Perfect for correcting OCR outputs and filling in missing data.
Limitations
- Unsuitable for:
- High-throughput text generation.
- Latency-sensitive applications.
- Challenges:
- Potential biases from synthetic data.
- Redundant or verbose reasoning.
Improvements in Version 1.1
- Enhanced Reasoning: Faster processing with reduced overthinking.
- Better Tool Utilization: More effective use of external tools.
- Improved Context Understanding: Aligns actions with user intentions.
- Reduced Redundancy: More concise responses.
- Less Task Aversion: Fewer refusals of routine tasks.
- Optimized Context Management: Efficient handling of the 256K context window.
Best Practices
Configuration Recommendations
- max_tokens:
- Simple Tasks: 1,024β4,096 tokens
- Complex Tasks: 8,000β16,000 tokens
- temperature:
- Objective Responses: 0.1β0.3
- Creative Reasoning: 0.7β1.0
- top_p:
- Focused Outputs: 0.85
- Precision Tasks: 0.1
- stop_sequences:
- Use specific sequences like "Therefore, the answer is," to minimize redundancy.
Prompt Engineering
- Simple Tasks:
- Example:
"You are a helpful assistant."
- Example:
- Complex Tasks:
- Example:
"Transform OCR outputs into valid JSON, return only the JSON data as output." - Structured Reasoning: "Not apply in "No Structured Tags", as it is not necessary or supported."
- Example:
Supervision and Monitoring
- Clear Prompts: Ensure instructions are specific and unambiguous to reduce errors and redundancies.
Known Issues (Addressed in V1.1)
- Task Management: Improved handling of complex tasks and function calls.
- Unusual Behavior: Reduced instances of unsolicited online searches or autonomous interactions.
- Conversational Redirection: Enhanced stability in maintaining topic focus.
- Function Call Execution: Ensured simulated function calls are actionable.
Citation
@misc{isa02,
author = {NeuraLake},
title = {iSA-02: The First Small Reasoning Model with Context-Dynamic Behavior},
year = {2024},
license = {Apache 2.0},
url = {https://huggingface.co/NeuraLake/iSA-02},
}
Note: This model card is under development and will be updated with additional details, evaluation metrics, and the final model name.
- Downloads last month
- 185
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit
