Instructions to use johannhartmann/Agent-ModernColBERT-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use johannhartmann/Agent-ModernColBERT-GGUF with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("johannhartmann/Agent-ModernColBERT-GGUF") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Agent-ModernColBERT-GGUF (ColBERT GGUF)
This repository contains GGUF format weights for the ColBERT retrieval model lightonai/Agent-ModernColBERT.
Available GGUF Formats
1. Late-Interaction pg_colbert Model (Agent-ModernColBERT.f16.gguf)
This is the custom late-interaction GGUF file (pg_colbert_v1 schema) containing the backbone transformer weights, tokenizer metadata, ColBERT dense projection layer, and similarity metrics.
It is designed specifically for our custom GGML-based ColBERT C++ runtime.
C++ Loading Example:
// Initialize the ColBERT GGML model context
colbert_model model = colbert_model_load("Agent-ModernColBERT.f16.gguf");
// Tokenize and encode queries into late-interaction token embeddings
std::vector<float> query_embeddings = colbert_encode_query(model, "Which planet is known as the Red Planet?");
2. Standard llama.cpp Compatible Model (Agent-ModernColBERT_llama_cpp.f16.gguf)
This is a standard llama.cpp-compliant embedding GGUF model. It maps the custom backbone tensor layout to standard layout keys (token_embd.weight, blk.i.*) and splits the concatenated feed-forward layers (mlp.Wi.weight) into separate ffn_gate and ffn_up weights required by standard llama.cpp activations. The late-interaction projection layer is excluded to allow direct use in standard llama.cpp tools.
It can be loaded directly using standard llama.cpp binaries (e.g. llama-embedding, llama-cli) or Python bindings (e.g. llama-cpp-python).
llama.cpp Usage Example:
./llama-embedding -m Agent-ModernColBERT_llama_cpp.f16.gguf -p "Mars is the Red Planet."
GGUF Conversion Info
Generated using the conversion and export utilities:
- pg_colbert conversion:
python tools/convert_colbert_hf_to_gguf.py --model-id lightonai/Agent-ModernColBERT --outfile Agent-ModernColBERT.f16.gguf --outtype f16 - llama.cpp export:
python tools/export_to_llama_cpp.py Agent-ModernColBERT.f16.gguf Agent-ModernColBERT_llama_cpp.f16.gguf
- Downloads last month
- 146
16-bit