ColBERT ModernBERT-base distilled from BGE on MS MARCO

This is a Multi-Vector Encoder model finetuned from answerdotai/ModernBERT-base on the ms-marco-en-bge dataset using the sentence-transformers library. It maps sentences & paragraphs to sequences of 128-dimensional token-level vectors and scores them with late interaction (MaxSim), useful for semantic search with late interaction.

Model Details

Model Description

Model Type: Multi-Vector Encoder
Base model: answerdotai/ModernBERT-base
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 128 dimensions
Similarity Function: MaxSim
Supported Modality: Text
Training Dataset:
- ms-marco-en-bge
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Sparse Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Multi-Vector Encoders on Hugging Face

Full Model Architecture

MultiVectorEncoder(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'query_length': 32, 'document_length': 180, 'do_query_expansion': True, 'attend_to_expansion_tokens': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'token_embeddings', 'module_output_name': 'token_embeddings', 'use_residual': False})
  (2): MultiVectorMask({'skiplist_words': ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~']})
  (3): Normalize({'module_input_name': 'token_embeddings', 'module_output_name': 'token_embeddings'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import MultiVectorEncoder

# Download from the 🤗 Hub
model = MultiVectorEncoder("tomaarsen/multivector-ModernBERT-base-msmarco-kd")
# Run inference: each input becomes a list of per-token vectors (variable length).
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
query_embeddings = model.encode_query(sentences)
document_embeddings = model.encode_document(sentences)
print(query_embeddings[0].shape)
# (num_query_tokens, 128)

# Get the MaxSim similarity scores
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[31.8979, 31.0701, 30.5561],
#         [31.2099, 31.9224, 30.6434],
#         [30.6855, 30.6123, 31.8958]])

Evaluation

Metrics

Multi Vector Information Retrieval

Datasets: NanoMSMARCO, NanoNQ, NanoFiQA2018, NanoClimateFEVER, NanoDBPedia, NanoFEVER, NanoFiQA2018, NanoHotpotQA, NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoQuoraRetrieval, NanoSCIDOCS, NanoArguAna, NanoSciFact and NanoTouche2020
Evaluated with MultiVectorInformationRetrievalEvaluator

Metric	NanoMSMARCO	NanoNQ	NanoFiQA2018	NanoClimateFEVER	NanoDBPedia	NanoFEVER	NanoHotpotQA	NanoNFCorpus	NanoQuoraRetrieval	NanoSCIDOCS	NanoArguAna	NanoSciFact	NanoTouche2020
MaxSim_accuracy@1	0.5	0.48	0.38	0.22	0.8	0.78	0.9	0.42	0.9	0.44	0.16	0.64	0.6939
MaxSim_accuracy@3	0.66	0.7	0.54	0.28	0.92	0.88	1.0	0.54	0.96	0.64	0.5	0.78	0.9592
MaxSim_accuracy@5	0.76	0.8	0.62	0.44	0.94	0.9	1.0	0.56	0.96	0.68	0.62	0.82	0.9796
MaxSim_accuracy@10	0.82	0.84	0.76	0.6	0.96	0.94	1.0	0.62	0.98	0.8	0.78	0.86	1.0
MaxSim_precision@1	0.5	0.48	0.38	0.22	0.8	0.78	0.9	0.42	0.9	0.44	0.16	0.64	0.6939
MaxSim_precision@3	0.22	0.2333	0.26	0.1	0.66	0.3133	0.5333	0.3467	0.38	0.3	0.1667	0.2733	0.6327
MaxSim_precision@5	0.152	0.164	0.188	0.092	0.596	0.192	0.34	0.32	0.232	0.248	0.124	0.184	0.6286
MaxSim_precision@10	0.082	0.088	0.12	0.072	0.528	0.102	0.178	0.272	0.128	0.162	0.078	0.096	0.5082
MaxSim_recall@1	0.5	0.45	0.1894	0.1233	0.1101	0.7167	0.45	0.0247	0.7907	0.0937	0.16	0.615	0.0506
MaxSim_recall@3	0.66	0.66	0.3577	0.1483	0.191	0.8433	0.8	0.0585	0.908	0.1857	0.5	0.76	0.1271
MaxSim_recall@5	0.76	0.76	0.4199	0.2083	0.2547	0.8633	0.85	0.079	0.912	0.2537	0.62	0.81	0.207
MaxSim_recall@10	0.82	0.79	0.5224	0.2923	0.3768	0.9133	0.89	0.1053	0.9593	0.3307	0.78	0.85	0.3262
MaxSim_ndcg@10	0.6574	0.6341	0.4197	0.232	0.6622	0.8399	0.8471	0.3145	0.9215	0.3355	0.4617	0.7462	0.5726
MaxSim_mrr@10	0.6056	0.6032	0.4849	0.3016	0.8627	0.8398	0.9433	0.486	0.9333	0.5556	0.3611	0.7143	0.8277
MaxSim_map@100	0.6173	0.5781	0.3486	0.1845	0.5102	0.8121	0.7837	0.1198	0.8996	0.2536	0.3669	0.7127	0.4026

Multi Vector Nano BEIR

Dataset: NanoBEIR_mean

Evaluated with MultiVectorNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "msmarco",
        "nq",
        "fiqa2018"
    ],
    "dataset_id": "sentence-transformers/NanoBEIR-en"
}

Metric	Value
MaxSim_accuracy@1	0.4533
MaxSim_accuracy@3	0.6333
MaxSim_accuracy@5	0.7267
MaxSim_accuracy@10	0.8067
MaxSim_precision@1	0.4533
MaxSim_precision@3	0.2378
MaxSim_precision@5	0.168
MaxSim_precision@10	0.0967
MaxSim_recall@1	0.3798
MaxSim_recall@3	0.5592
MaxSim_recall@5	0.6466
MaxSim_recall@10	0.7108
MaxSim_ndcg@10	0.5704
MaxSim_mrr@10	0.5645
MaxSim_map@100	0.5146

Multi Vector Nano BEIR

Dataset: NanoBEIR_mean

Evaluated with MultiVectorNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "climatefever",
        "dbpedia",
        "fever",
        "fiqa2018",
        "hotpotqa",
        "msmarco",
        "nfcorpus",
        "nq",
        "quoraretrieval",
        "scidocs",
        "arguana",
        "scifact",
        "touche2020"
    ],
    "dataset_id": "sentence-transformers/NanoBEIR-en"
}

Metric	Value
MaxSim_accuracy@1	0.5626
MaxSim_accuracy@3	0.7199
MaxSim_accuracy@5	0.7754
MaxSim_accuracy@10	0.8431
MaxSim_precision@1	0.5626
MaxSim_precision@3	0.3399
MaxSim_precision@5	0.2662
MaxSim_precision@10	0.1857
MaxSim_recall@1	0.3288
MaxSim_recall@3	0.4769
MaxSim_recall@5	0.5383
MaxSim_recall@10	0.612
MaxSim_ndcg@10	0.588
MaxSim_mrr@10	0.6553
MaxSim_map@100	0.5069

Training Details

Training Dataset

ms-marco-en-bge

Dataset: ms-marco-en-bge at ad24729
Size: 20,000 training samples
Columns: query, documents, and scores
Approximate statistics based on the first 100 samples:
query documents scores
type string list list

modality text
details
min: 4 tokens
mean: 9.92 tokens
max: 21 tokens

size: 32 elements

size: 32 elements

	query	documents	scores
type	string	list	list
modality	text
details	min: 4 tokens mean: 9.92 tokens max: 21 tokens	size: 32 elements	size: 32 elements

Samples:

query	documents	scores
`define extreme`	`['extremist. 1 AN EXTREMIST IS SOMEONE WHO SUPPORTS AN IDEA, CAUSE, OR SET OF VALUES SO ADAMANTLY AND WITHOUT COMPROMISE THAT SAID PERSON WILL USE THEIR IDEAS TO JUSTIFY ANYTHING THEY DO.', "at the extreme end meaning, at the extreme end definition`	English Cobuild dictionary. extreme. 1 adj Extreme means very great in degree or intensity. The girls were afraid of snakes and picked their way along with extreme caution., ...people living in extreme poverty., ...the author's extreme reluctance to generalise.", 'extremity (plural extremities) 1 The most extreme or furthest point of something. 2 An extreme measure. 3 A hand or foot. A limb (major appendage of human or animal such as a leg an arm or a wing)', ': extreme in a way that is not normal or that shows an illness or mental problem. medical: relating to or caused by disease.: of or relating to the study of diseases: relating to pathology. extreme in a way that is not normal or that shows an illness or mental problem. medical: relating to or caused by disease.', 'Definition of extreme. 1a : existing in a very high degree extreme povertyb : going to great or exaggerated lengths : radical went on an extreme dietc : exceeding the ordinary, usual, or expected extreme weather conditions. 2 archaic : last.', ...]
`what does chattel mean on credit history`	["Duhaime's Law Dictionary. Chattel Mortgage Definition: Related Terms: Chattel, Mortgage. When a lien is given on goods, chattels, moveable or personal property (other than real property in which case it is referred to as just a mortgage), in writing, to guarantee the payment of a debt or the execution of some action.", 'From Wikipedia, the free encyclopedia. Chattel mortgage, sometimes abbreviated CM, is the legal term for a type of loan contract used in some states with legal systems derived from English law. Under a typical chattel mortgage, the purchaser borrows funds for the purchase of movable personal property (the chattel) from the lender. The lender then secures the loan with a mortgage over the chattel.', 'Chattel Mortgages In Detail. A Chattel Mortgage uses your vehicle or some other (non-real estate) property as the security on the loan meaning you can access a low interest rate.ncidentally, these loans can be used for other purposes such as business equipment. If you have a preference for Chattel Mortgage, ask the team at 360 Finance. Term / Length of the loan â\x80\x93 the life of the loan or the time you have to pay it off.', 'A chattel mortgage is a mortgage that provides for a security interest in assets other than real estate to secure the loan. In the event of a default in payments, the lender has a lien in the assets used as collateral for the loan. In most states, a security agreement has replaced the use of chattel mortgages. chattel mortgage is a mortgage that provides for a security interest in assets other than real estate to secure the loan. In the event of a default in payments, the lender has a lien in the assets used as collateral for the loan. In most states, a security agreement has replaced the use of chattel mortgages.', 'A Chattel Mortgage is a type of loan contract that allows the buyer to take ownership of a vehicle at the time of purchase. The lender provides the buyer with the total loan amount to cover the price of the vehicle (chattel) so that it can be bought outright.', ...]	`[0.7124203443527222, 0.7379189729690552, 0.5786551237106323, 0.6142299175262451, 0.6755089163780212, ...]`
`what was the great leap forward brainly`	['It was a clever scheme that was hatched soon after the 1949 revolution. The first phase was to send spies to the west during the great leap forward in the 1950s to plant falsified basic science into our western understanding of physics.', 'Great Leap Forward Devolution Into the Great Famine . Yang Jisheng, the author of Tombstone , wrote in the New York Times, â\x80\x9cThe Great Leap Forward that Mao began in 1958 set ambitious goals without the means to meet them. A vicious cycle ensued; exaggerated production reports from below emboldened the higher-ups to set even loftier targets.', 'In 1958 Mao introduced a second five year plan which became known as the â\x80\x98Great Leap Forwardâ\x80\x99 (GLF). He believed it was possible for China to overtake Britain as a leading industrial power within seven years and the USA soon after.n 1958 Mao introduced a second five year plan which became known as the â\x80\x98Great Leap Forwardâ\x80\x99 (GLF). He believed it was possible for China to overtake Britain as a leading industrial power within seven years and the USA soon after.', 'The Great Leap Forward approach was epitomized by the development of small backyard steel furnaces in every village and urban neighbourhood, which were intended to accelerate the industrialization process.', 'The Great Leap Forward was begun in 1957 by Chairman Mao Zedong to bring the nation quickly into the forefront of economic development. Mao wanted China to become a leading industrial power, and to accomplish his goals he and his colleagues pushed for the construction of steel plants across the country.', ...]	`[0.6462352871894836, 0.7880821228027344, 0.791019856929779, 0.7709633111953735, 0.8284491300582886, ...]`

Loss: MultiVectorDistillKLDivLoss with these parameters:

{
    "score_metric": "colbert_kd_scores",
    "normalize_scores": true,
    "temperature": 1.0,
    "size_average": true
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 4
num_train_epochs: 1
learning_rate: 3e-05
warmup_steps: 0.05
bf16: True
per_device_eval_batch_size: 4
load_best_model_at_end: True

All Hyperparameters

Click to expand

per_device_train_batch_size: 4
num_train_epochs: 1
max_steps: -1
learning_rate: 3e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.05
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: None
trackio_bucket_id: None
trackio_static_space_id: None
per_device_eval_batch_size: 4
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_static_graph: None
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	NanoMSMARCO_MaxSim_ndcg@10	NanoNQ_MaxSim_ndcg@10	NanoFiQA2018_MaxSim_ndcg@10	NanoBEIR_mean_MaxSim_ndcg@10	NanoClimateFEVER_MaxSim_ndcg@10	NanoDBPedia_MaxSim_ndcg@10	NanoFEVER_MaxSim_ndcg@10	NanoHotpotQA_MaxSim_ndcg@10	NanoNFCorpus_MaxSim_ndcg@10	NanoQuoraRetrieval_MaxSim_ndcg@10	NanoSCIDOCS_MaxSim_ndcg@10	NanoArguAna_MaxSim_ndcg@10	NanoSciFact_MaxSim_ndcg@10	NanoTouche2020_MaxSim_ndcg@10
-1	-1	-	0.1306	0.1331	0.1222	0.1286	-	-	-	-	-	-	-	-	-	-
0.01	50	0.0485	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.02	100	0.0473	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.03	150	0.0420	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.04	200	0.0359	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.05	250	0.0343	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.06	300	0.0300	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.07	350	0.0305	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.08	400	0.0290	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.09	450	0.0282	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1	500	0.0264	0.5714	0.5740	0.3619	0.5024	-	-	-	-	-	-	-	-	-	-
0.11	550	0.0280	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.12	600	0.0253	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.13	650	0.0263	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.14	700	0.0241	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.15	750	0.0244	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.16	800	0.0249	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.17	850	0.0249	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.18	900	0.0237	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.19	950	0.0247	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2	1000	0.0239	0.6188	0.5772	0.4104	0.5355	-	-	-	-	-	-	-	-	-	-
0.21	1050	0.0236	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.22	1100	0.0244	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.23	1150	0.0212	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.24	1200	0.0215	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.25	1250	0.0220	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.26	1300	0.0222	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.27	1350	0.0218	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.28	1400	0.0214	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.29	1450	0.0220	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3	1500	0.0218	0.6169	0.5738	0.4178	0.5362	-	-	-	-	-	-	-	-	-	-
0.31	1550	0.0204	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.32	1600	0.0214	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.33	1650	0.0198	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.34	1700	0.0204	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.35	1750	0.0206	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.36	1800	0.0196	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.37	1850	0.0197	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.38	1900	0.0194	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.39	1950	0.0190	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4	2000	0.0188	0.6456	0.6144	0.4357	0.5652	-	-	-	-	-	-	-	-	-	-
0.41	2050	0.0180	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.42	2100	0.0202	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.43	2150	0.0201	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.44	2200	0.0177	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.45	2250	0.0174	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.46	2300	0.0180	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.47	2350	0.0193	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.48	2400	0.0204	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.49	2450	0.0171	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5	2500	0.0165	0.6330	0.6038	0.3882	0.5417	-	-	-	-	-	-	-	-	-	-
0.51	2550	0.0179	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.52	2600	0.0165	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.53	2650	0.0168	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.54	2700	0.0168	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.55	2750	0.0176	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.56	2800	0.0161	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.57	2850	0.0176	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.58	2900	0.0176	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.59	2950	0.0173	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6	3000	0.0177	0.6493	0.6436	0.4075	0.5668	-	-	-	-	-	-	-	-	-	-
0.61	3050	0.0179	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.62	3100	0.0170	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.63	3150	0.0183	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.64	3200	0.0178	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.65	3250	0.0180	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.66	3300	0.0171	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.67	3350	0.0168	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.68	3400	0.0168	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.69	3450	0.0151	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7	3500	0.0177	0.6577	0.6343	0.3877	0.5599	-	-	-	-	-	-	-	-	-	-
0.71	3550	0.0164	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.72	3600	0.0165	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.73	3650	0.0165	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.74	3700	0.0162	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.75	3750	0.0166	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.76	3800	0.0163	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.77	3850	0.0157	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.78	3900	0.0182	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.79	3950	0.0171	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8	4000	0.0170	0.6489	0.6356	0.4080	0.5642	-	-	-	-	-	-	-	-	-	-
0.81	4050	0.0167	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.82	4100	0.0152	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.83	4150	0.0147	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.84	4200	0.0165	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.85	4250	0.0164	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.86	4300	0.0157	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.87	4350	0.0165	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.88	4400	0.0154	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.89	4450	0.0154	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9	4500	0.0162	0.6392	0.6391	0.4087	0.5623	-	-	-	-	-	-	-	-	-	-
0.91	4550	0.0171	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.92	4600	0.0159	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.93	4650	0.0164	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.94	4700	0.0157	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.95	4750	0.0162	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.96	4800	0.0154	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.97	4850	0.0144	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.98	4900	0.0158	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.99	4950	0.0150	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.0	5000	0.0156	0.6574	0.6341	0.4197	0.5704	-	-	-	-	-	-	-	-	-	-
-1	-1	-	0.6574	0.6341	0.4197	0.5880	0.2320	0.6622	0.8399	0.8471	0.3145	0.9215	0.3355	0.4617	0.7462	0.5726

The bold row denotes the saved checkpoint.

Training Time

Training: 1.2 hours
Evaluation: 27.5 minutes
Total: 1.7 hours

Framework Versions

Python: 3.11.6
Sentence Transformers: 5.6.0.dev0
Transformers: 5.8.0.dev0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0.dev0
Datasets: 4.8.4
Tokenizers: 0.22.2

Additional Resources

Sentence Transformers Documentation: the full documentation site, including training, evaluation, and pre-trained model catalogs.
PyLate: the upstream library whose features were absorbed into Sentence Transformers for multi-vector / late-interaction models.

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultiVectorDistillKLDivLoss

@inproceedings{santhanam-etal-2022-colbertv2,
    title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction",
    author = "Santhanam, Keshav and Khattab, Omar and Saad-Falcon, Jon and Potts, Christopher and Zaharia, Matei",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Downloads last month: 17

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for tomaarsen/multivector-ModernBERT-base-msmarco-kd

Base model

answerdotai/ModernBERT-base

Finetuned

(1273)

this model

Dataset used to train tomaarsen/multivector-ModernBERT-base-msmarco-kd

Paper for tomaarsen/multivector-ModernBERT-base-msmarco-kd

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13

Evaluation results

Maxsim Accuracy@1 on NanoMSMARCO
self-reported

0.500
Maxsim Accuracy@3 on NanoMSMARCO
self-reported

0.660
Maxsim Accuracy@5 on NanoMSMARCO
self-reported

0.760
Maxsim Accuracy@10 on NanoMSMARCO
self-reported

0.820
Maxsim Precision@1 on NanoMSMARCO
self-reported

0.500
Maxsim Precision@3 on NanoMSMARCO
self-reported

0.220
Maxsim Precision@5 on NanoMSMARCO
self-reported

0.152
Maxsim Precision@10 on NanoMSMARCO
self-reported

0.082