Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

ConicAI Coding LLM

Model Details

Model Description

ConicAI LLM Model is a parameter-efficient fine-tuned coding assistant built using LoRA on top of Qwen2.5-Coder. It is designed to generate, debug, and explain code with structured outputs.

Developed by: GIRISH KUMAR DEWANGAN
Model type: Causal Language Model (Code LLM)
Language(s): Python, general programming
used for: Code generation, debugging, fixing error, getting evaluation score, check hallucination and relevancy score as well
License: Apache 2.0
Finetuned from model: Qwen/Qwen2.5-Coder-0.5B-Instruct

Model Sources

Repository: https://huggingface.co/girish00/ConicAI_LLM_model
Paper: View Paper

Uses

Direct Use

Code generation
Debugging
Code explanation
Learning programming

Downstream Use

Coding assistants
AI-based education tools
Developer productivity tools

Out-of-Scope Use

Security-critical systems
Autonomous production systems
High-risk environments

Bias, Risks, and Limitations

May generate incorrect logic
Confidence scores are heuristic
Output depends on prompt quality
Limited dataset generalization

Recommendations

Always validate generated code
Use structured prompts
Avoid ambiguous instructions

Structured Output Framework

The model produces outputs in structured JSON format:

{
  "code": "...",
  "explanation": "...",
  "confidence": 0.84,
  "relevancy_score": 0.82,
  "hallucination": false
}

This enables:

-Easy API integration
-Automated evaluation
-Better interpretability

How to Get Started with the Model

!pip -q install -U transformers peft accelerate huggingface_hub safetensors
!pip install --upgrade torchao

from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

model = "girish00/ConicAI_LLM_model"
prompt = input("Enter your prompt: ")

from huggingface_hub import login, snapshot_download
login(token=HF_TOKEN)

repo = snapshot_download(model, token=HF_TOKEN)

import sys, os
sys.path.append(repo)

from infer_local import build_instruction_prompt, build_structured_result
from peft import PeftConfig, PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time, json

cfg = PeftConfig.from_pretrained(repo)
base = cfg.base_model_name_or_path

tokenizer = AutoTokenizer.from_pretrained(base)

base_model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)

llm = PeftModel.from_pretrained(base_model, repo)
llm.eval()

inputs = tokenizer(build_instruction_prompt(prompt), return_tensors="pt").to(llm.device)

start = time.perf_counter()

with torch.no_grad():
    out = llm.generate(
        **inputs,
        max_new_tokens=320,
        output_scores=True,
        return_dict_in_generate=True,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

latency = int((time.perf_counter() - start) * 1000)

gen_ids = out.sequences[0][inputs["input_ids"].shape[1]:].tolist()
text = tokenizer.decode(gen_ids, skip_special_tokens=True)

conf = []
for tid, score in zip(gen_ids, out.scores):
    probs = torch.softmax(score[0], dim=-1)
    conf.append(float(probs[tid].item()))

print(json.dumps(
    build_structured_result(
        prompt,
        text,
        latency,
        tokenizer=tokenizer,
        generated_ids=gen_ids,
        token_confidences=conf
    ),
    indent=2
))

📊 Benchmark Results

Training Details

Dataset

Size: ~5K samples
Instruction-based coding dataset

Training Procedure

Method: LoRA fine-tuning
Framework: Transformers + PEFT
Precision: FP16 / Mixed

Training Hyperparameters

Parameter	Value
Epochs	1–3
Batch Size	2
Learning Rate	2e-4
Max Sequence Length	512
LoRA Rank (r)	8
LoRA Alpha	16
LoRA Dropout	0.05

Inference Configuration

max_new_tokens = 200
temperature = 0.2
top_p = 0.9
do_sample = True

Evaluation

Metrics

Code correctness
Syntax validity
Relevancy score
Hallucination rate
Confidence score
Latency

Results Summary

Higher correctness vs base model
Lower hallucination rate
Better structured outputs

Technical Specifications

Architecture

Transformer-based causal LM
LoRA adaptation

Hardware

GPU recommended (optional)
CPU supported

Software

Transformers
PEFT
PyTorch

Environmental Impact

Low compute due to LoRA
Efficient fine-tuning

Citation

BibTeX:

@misc{conicai_llm,
  author = {Girish},
  title = {ConicAI Coding LLM},
  year = {2026},
  publisher = {Hugging Face}
}

Model Card Authors

GIRISH KUMAR DEWANGAN

Framework versions

PEFT 0.19.0

Downloads last month: 495

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for girish00/ConicAI_LLM_model

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B-Instruct

Adapter

(43)

this model