Agent-ModernColBERT-GGUF (ColBERT GGUF)

This repository contains GGUF format weights for the ColBERT retrieval model lightonai/Agent-ModernColBERT.

Available GGUF Formats

1. Late-Interaction pg_colbert Model (Agent-ModernColBERT.f16.gguf)

This is the custom late-interaction GGUF file (pg_colbert_v1 schema) containing the backbone transformer weights, tokenizer metadata, ColBERT dense projection layer, and similarity metrics.

It is designed specifically for our custom GGML-based ColBERT C++ runtime.

C++ Loading Example:

// Initialize the ColBERT GGML model context
colbert_model model = colbert_model_load("Agent-ModernColBERT.f16.gguf");

// Tokenize and encode queries into late-interaction token embeddings
std::vector<float> query_embeddings = colbert_encode_query(model, "Which planet is known as the Red Planet?");

2. Standard llama.cpp Compatible Model (Agent-ModernColBERT_llama_cpp.f16.gguf)

This is a standard llama.cpp-compliant embedding GGUF model. It maps the custom backbone tensor layout to standard layout keys (token_embd.weight, blk.i.*) and splits the concatenated feed-forward layers (mlp.Wi.weight) into separate ffn_gate and ffn_up weights required by standard llama.cpp activations. The late-interaction projection layer is excluded to allow direct use in standard llama.cpp tools.

It can be loaded directly using standard llama.cpp binaries (e.g. llama-embedding, llama-cli) or Python bindings (e.g. llama-cpp-python).

llama.cpp Usage Example:

./llama-embedding -m Agent-ModernColBERT_llama_cpp.f16.gguf -p "Mars is the Red Planet."

GGUF Conversion Info

Generated using the conversion and export utilities:

  • pg_colbert conversion: python tools/convert_colbert_hf_to_gguf.py --model-id lightonai/Agent-ModernColBERT --outfile Agent-ModernColBERT.f16.gguf --outtype f16
  • llama.cpp export: python tools/export_to_llama_cpp.py Agent-ModernColBERT.f16.gguf Agent-ModernColBERT_llama_cpp.f16.gguf
Downloads last month
146
GGUF
Model size
0.1B params
Architecture
modernbert
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support