ColBERT ModernBERT-base distilled from BGE on MS MARCO

This is a Multi-Vector Encoder model finetuned from answerdotai/ModernBERT-base on the ms-marco-en-bge dataset using the sentence-transformers library. It maps sentences & paragraphs to sequences of 128-dimensional token-level vectors and scores them with late interaction (MaxSim), useful for semantic search with late interaction.

Model Details

Model Description

  • Model Type: Multi-Vector Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 128 dimensions
  • Similarity Function: MaxSim
  • Supported Modality: Text
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

MultiVectorEncoder(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'query_length': 32, 'document_length': 180, 'do_query_expansion': True, 'attend_to_expansion_tokens': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'token_embeddings', 'module_output_name': 'token_embeddings', 'use_residual': False})
  (2): MultiVectorMask({'skiplist_words': ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~']})
  (3): Normalize({'module_input_name': 'token_embeddings', 'module_output_name': 'token_embeddings'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import MultiVectorEncoder

# Download from the 🤗 Hub
model = MultiVectorEncoder("tomaarsen/multivector-ModernBERT-base-msmarco-kd")
# Run inference: each input becomes a list of per-token vectors (variable length).
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
query_embeddings = model.encode_query(sentences)
document_embeddings = model.encode_document(sentences)
print(query_embeddings[0].shape)
# (num_query_tokens, 128)

# Get the MaxSim similarity scores
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[31.8979, 31.0701, 30.5561],
#         [31.2099, 31.9224, 30.6434],
#         [30.6855, 30.6123, 31.8958]])

Evaluation

Metrics

Multi Vector Information Retrieval

  • Datasets: NanoMSMARCO, NanoNQ, NanoFiQA2018, NanoClimateFEVER, NanoDBPedia, NanoFEVER, NanoFiQA2018, NanoHotpotQA, NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoQuoraRetrieval, NanoSCIDOCS, NanoArguAna, NanoSciFact and NanoTouche2020
  • Evaluated with MultiVectorInformationRetrievalEvaluator
Metric NanoMSMARCO NanoNQ NanoFiQA2018 NanoClimateFEVER NanoDBPedia NanoFEVER NanoHotpotQA NanoNFCorpus NanoQuoraRetrieval NanoSCIDOCS NanoArguAna NanoSciFact NanoTouche2020
MaxSim_accuracy@1 0.5 0.48 0.38 0.22 0.8 0.78 0.9 0.42 0.9 0.44 0.16 0.64 0.6939
MaxSim_accuracy@3 0.66 0.7 0.54 0.28 0.92 0.88 1.0 0.54 0.96 0.64 0.5 0.78 0.9592
MaxSim_accuracy@5 0.76 0.8 0.62 0.44 0.94 0.9 1.0 0.56 0.96 0.68 0.62 0.82 0.9796
MaxSim_accuracy@10 0.82 0.84 0.76 0.6 0.96 0.94 1.0 0.62 0.98 0.8 0.78 0.86 1.0
MaxSim_precision@1 0.5 0.48 0.38 0.22 0.8 0.78 0.9 0.42 0.9 0.44 0.16 0.64 0.6939
MaxSim_precision@3 0.22 0.2333 0.26 0.1 0.66 0.3133 0.5333 0.3467 0.38 0.3 0.1667 0.2733 0.6327
MaxSim_precision@5 0.152 0.164 0.188 0.092 0.596 0.192 0.34 0.32 0.232 0.248 0.124 0.184 0.6286
MaxSim_precision@10 0.082 0.088 0.12 0.072 0.528 0.102 0.178 0.272 0.128 0.162 0.078 0.096 0.5082
MaxSim_recall@1 0.5 0.45 0.1894 0.1233 0.1101 0.7167 0.45 0.0247 0.7907 0.0937 0.16 0.615 0.0506
MaxSim_recall@3 0.66 0.66 0.3577 0.1483 0.191 0.8433 0.8 0.0585 0.908 0.1857 0.5 0.76 0.1271
MaxSim_recall@5 0.76 0.76 0.4199 0.2083 0.2547 0.8633 0.85 0.079 0.912 0.2537 0.62 0.81 0.207
MaxSim_recall@10 0.82 0.79 0.5224 0.2923 0.3768 0.9133 0.89 0.1053 0.9593 0.3307 0.78 0.85 0.3262
MaxSim_ndcg@10 0.6574 0.6341 0.4197 0.232 0.6622 0.8399 0.8471 0.3145 0.9215 0.3355 0.4617 0.7462 0.5726
MaxSim_mrr@10 0.6056 0.6032 0.4849 0.3016 0.8627 0.8398 0.9433 0.486 0.9333 0.5556 0.3611 0.7143 0.8277
MaxSim_map@100 0.6173 0.5781 0.3486 0.1845 0.5102 0.8121 0.7837 0.1198 0.8996 0.2536 0.3669 0.7127 0.4026

Multi Vector Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with MultiVectorNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nq",
            "fiqa2018"
        ],
        "dataset_id": "sentence-transformers/NanoBEIR-en"
    }
    
Metric Value
MaxSim_accuracy@1 0.4533
MaxSim_accuracy@3 0.6333
MaxSim_accuracy@5 0.7267
MaxSim_accuracy@10 0.8067
MaxSim_precision@1 0.4533
MaxSim_precision@3 0.2378
MaxSim_precision@5 0.168
MaxSim_precision@10 0.0967
MaxSim_recall@1 0.3798
MaxSim_recall@3 0.5592
MaxSim_recall@5 0.6466
MaxSim_recall@10 0.7108
MaxSim_ndcg@10 0.5704
MaxSim_mrr@10 0.5645
MaxSim_map@100 0.5146

Multi Vector Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with MultiVectorNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "climatefever",
            "dbpedia",
            "fever",
            "fiqa2018",
            "hotpotqa",
            "msmarco",
            "nfcorpus",
            "nq",
            "quoraretrieval",
            "scidocs",
            "arguana",
            "scifact",
            "touche2020"
        ],
        "dataset_id": "sentence-transformers/NanoBEIR-en"
    }
    
Metric Value
MaxSim_accuracy@1 0.5626
MaxSim_accuracy@3 0.7199
MaxSim_accuracy@5 0.7754
MaxSim_accuracy@10 0.8431
MaxSim_precision@1 0.5626
MaxSim_precision@3 0.3399
MaxSim_precision@5 0.2662
MaxSim_precision@10 0.1857
MaxSim_recall@1 0.3288
MaxSim_recall@3 0.4769
MaxSim_recall@5 0.5383
MaxSim_recall@10 0.612
MaxSim_ndcg@10 0.588
MaxSim_mrr@10 0.6553
MaxSim_map@100 0.5069

Training Details

Training Dataset

ms-marco-en-bge

  • Dataset: ms-marco-en-bge at ad24729
  • Size: 20,000 training samples
  • Columns: query, documents, and scores
  • Approximate statistics based on the first 100 samples:
    query documents scores
    type string list list
    modality text
    details
    • min: 4 tokens
    • mean: 9.92 tokens
    • max: 21 tokens
    • size: 32 elements
    • size: 32 elements
  • Samples:
    query documents scores
    define extreme ['extremist. 1 AN EXTREMIST IS SOMEONE WHO SUPPORTS AN IDEA, CAUSE, OR SET OF VALUES SO ADAMANTLY AND WITHOUT COMPROMISE THAT SAID PERSON WILL USE THEIR IDEAS TO JUSTIFY ANYTHING THEY DO.', "at the extreme end meaning, at the extreme end definition English Cobuild dictionary. extreme. 1 adj Extreme means very great in degree or intensity. The girls were afraid of snakes and picked their way along with extreme caution., ...people living in extreme poverty., ...the author's extreme reluctance to generalise.", 'extremity (plural extremities) 1 The most extreme or furthest point of something. 2 An extreme measure. 3 A hand or foot. A limb (major appendage of human or animal such as a leg an arm or a wing)', ': extreme in a way that is not normal or that shows an illness or mental problem. medical: relating to or caused by disease.: of or relating to the study of diseases: relating to pathology. extreme in a way that is not normal or that shows an illness or mental problem. medical: relating to or caused by disease.', 'Definition of extreme. 1a : existing in a very high degree extreme povertyb : going to great or exaggerated lengths : radical went on an extreme dietc : exceeding the ordinary, usual, or expected extreme weather conditions. 2 archaic : last.', ...]
    what does chattel mean on credit history ["Duhaime's Law Dictionary. Chattel Mortgage Definition: Related Terms: Chattel, Mortgage. When a lien is given on goods, chattels, moveable or personal property (other than real property in which case it is referred to as just a mortgage), in writing, to guarantee the payment of a debt or the execution of some action.", 'From Wikipedia, the free encyclopedia. Chattel mortgage, sometimes abbreviated CM, is the legal term for a type of loan contract used in some states with legal systems derived from English law. Under a typical chattel mortgage, the purchaser borrows funds for the purchase of movable personal property (the chattel) from the lender. The lender then secures the loan with a mortgage over the chattel.', 'Chattel Mortgages In Detail. A Chattel Mortgage uses your vehicle or some other (non-real estate) property as the security on the loan meaning you can access a low interest rate.ncidentally, these loans can be used for other purposes such as business equipment. If you have a preference for Chattel Mortgage, ask the team at 360 Finance. Term / Length of the loan â\x80\x93 the life of the loan or the time you have to pay it off.', 'A chattel mortgage is a mortgage that provides for a security interest in assets other than real estate to secure the loan. In the event of a default in payments, the lender has a lien in the assets used as collateral for the loan. In most states, a security agreement has replaced the use of chattel mortgages. chattel mortgage is a mortgage that provides for a security interest in assets other than real estate to secure the loan. In the event of a default in payments, the lender has a lien in the assets used as collateral for the loan. In most states, a security agreement has replaced the use of chattel mortgages.', 'A Chattel Mortgage is a type of loan contract that allows the buyer to take ownership of a vehicle at the time of purchase. The lender provides the buyer with the total loan amount to cover the price of the vehicle (chattel) so that it can be bought outright.', ...] [0.7124203443527222, 0.7379189729690552, 0.5786551237106323, 0.6142299175262451, 0.6755089163780212, ...]
    what was the great leap forward brainly ['It was a clever scheme that was hatched soon after the 1949 revolution. The first phase was to send spies to the west during the great leap forward in the 1950s to plant falsified basic science into our western understanding of physics.', 'Great Leap Forward Devolution Into the Great Famine . Yang Jisheng, the author of Tombstone , wrote in the New York Times, â\x80\x9cThe Great Leap Forward that Mao began in 1958 set ambitious goals without the means to meet them. A vicious cycle ensued; exaggerated production reports from below emboldened the higher-ups to set even loftier targets.', 'In 1958 Mao introduced a second five year plan which became known as the â\x80\x98Great Leap Forwardâ\x80\x99 (GLF). He believed it was possible for China to overtake Britain as a leading industrial power within seven years and the USA soon after.n 1958 Mao introduced a second five year plan which became known as the â\x80\x98Great Leap Forwardâ\x80\x99 (GLF). He believed it was possible for China to overtake Britain as a leading industrial power within seven years and the USA soon after.', 'The Great Leap Forward approach was epitomized by the development of small backyard steel furnaces in every village and urban neighbourhood, which were intended to accelerate the industrialization process.', 'The Great Leap Forward was begun in 1957 by Chairman Mao Zedong to bring the nation quickly into the forefront of economic development. Mao wanted China to become a leading industrial power, and to accomplish his goals he and his colleagues pushed for the construction of steel plants across the country.', ...] [0.6462352871894836, 0.7880821228027344, 0.791019856929779, 0.7709633111953735, 0.8284491300582886, ...]
  • Loss: MultiVectorDistillKLDivLoss with these parameters:
    {
        "score_metric": "colbert_kd_scores",
        "normalize_scores": true,
        "temperature": 1.0,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • num_train_epochs: 1
  • learning_rate: 3e-05
  • warmup_steps: 0.05
  • bf16: True
  • per_device_eval_batch_size: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 4
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 3e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 4
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss NanoMSMARCO_MaxSim_ndcg@10 NanoNQ_MaxSim_ndcg@10 NanoFiQA2018_MaxSim_ndcg@10 NanoBEIR_mean_MaxSim_ndcg@10 NanoClimateFEVER_MaxSim_ndcg@10 NanoDBPedia_MaxSim_ndcg@10 NanoFEVER_MaxSim_ndcg@10 NanoHotpotQA_MaxSim_ndcg@10 NanoNFCorpus_MaxSim_ndcg@10 NanoQuoraRetrieval_MaxSim_ndcg@10 NanoSCIDOCS_MaxSim_ndcg@10 NanoArguAna_MaxSim_ndcg@10 NanoSciFact_MaxSim_ndcg@10 NanoTouche2020_MaxSim_ndcg@10
-1 -1 - 0.1306 0.1331 0.1222 0.1286 - - - - - - - - - -
0.01 50 0.0485 - - - - - - - - - - - - - -
0.02 100 0.0473 - - - - - - - - - - - - - -
0.03 150 0.0420 - - - - - - - - - - - - - -
0.04 200 0.0359 - - - - - - - - - - - - - -
0.05 250 0.0343 - - - - - - - - - - - - - -
0.06 300 0.0300 - - - - - - - - - - - - - -
0.07 350 0.0305 - - - - - - - - - - - - - -
0.08 400 0.0290 - - - - - - - - - - - - - -
0.09 450 0.0282 - - - - - - - - - - - - - -
0.1 500 0.0264 0.5714 0.5740 0.3619 0.5024 - - - - - - - - - -
0.11 550 0.0280 - - - - - - - - - - - - - -
0.12 600 0.0253 - - - - - - - - - - - - - -
0.13 650 0.0263 - - - - - - - - - - - - - -
0.14 700 0.0241 - - - - - - - - - - - - - -
0.15 750 0.0244 - - - - - - - - - - - - - -
0.16 800 0.0249 - - - - - - - - - - - - - -
0.17 850 0.0249 - - - - - - - - - - - - - -
0.18 900 0.0237 - - - - - - - - - - - - - -
0.19 950 0.0247 - - - - - - - - - - - - - -
0.2 1000 0.0239 0.6188 0.5772 0.4104 0.5355 - - - - - - - - - -
0.21 1050 0.0236 - - - - - - - - - - - - - -
0.22 1100 0.0244 - - - - - - - - - - - - - -
0.23 1150 0.0212 - - - - - - - - - - - - - -
0.24 1200 0.0215 - - - - - - - - - - - - - -
0.25 1250 0.0220 - - - - - - - - - - - - - -
0.26 1300 0.0222 - - - - - - - - - - - - - -
0.27 1350 0.0218 - - - - - - - - - - - - - -
0.28 1400 0.0214 - - - - - - - - - - - - - -
0.29 1450 0.0220 - - - - - - - - - - - - - -
0.3 1500 0.0218 0.6169 0.5738 0.4178 0.5362 - - - - - - - - - -
0.31 1550 0.0204 - - - - - - - - - - - - - -
0.32 1600 0.0214 - - - - - - - - - - - - - -
0.33 1650 0.0198 - - - - - - - - - - - - - -
0.34 1700 0.0204 - - - - - - - - - - - - - -
0.35 1750 0.0206 - - - - - - - - - - - - - -
0.36 1800 0.0196 - - - - - - - - - - - - - -
0.37 1850 0.0197 - - - - - - - - - - - - - -
0.38 1900 0.0194 - - - - - - - - - - - - - -
0.39 1950 0.0190 - - - - - - - - - - - - - -
0.4 2000 0.0188 0.6456 0.6144 0.4357 0.5652 - - - - - - - - - -
0.41 2050 0.0180 - - - - - - - - - - - - - -
0.42 2100 0.0202 - - - - - - - - - - - - - -
0.43 2150 0.0201 - - - - - - - - - - - - - -
0.44 2200 0.0177 - - - - - - - - - - - - - -
0.45 2250 0.0174 - - - - - - - - - - - - - -
0.46 2300 0.0180 - - - - - - - - - - - - - -
0.47 2350 0.0193 - - - - - - - - - - - - - -
0.48 2400 0.0204 - - - - - - - - - - - - - -
0.49 2450 0.0171 - - - - - - - - - - - - - -
0.5 2500 0.0165 0.6330 0.6038 0.3882 0.5417 - - - - - - - - - -
0.51 2550 0.0179 - - - - - - - - - - - - - -
0.52 2600 0.0165 - - - - - - - - - - - - - -
0.53 2650 0.0168 - - - - - - - - - - - - - -
0.54 2700 0.0168 - - - - - - - - - - - - - -
0.55 2750 0.0176 - - - - - - - - - - - - - -
0.56 2800 0.0161 - - - - - - - - - - - - - -
0.57 2850 0.0176 - - - - - - - - - - - - - -
0.58 2900 0.0176 - - - - - - - - - - - - - -
0.59 2950 0.0173 - - - - - - - - - - - - - -
0.6 3000 0.0177 0.6493 0.6436 0.4075 0.5668 - - - - - - - - - -
0.61 3050 0.0179 - - - - - - - - - - - - - -
0.62 3100 0.0170 - - - - - - - - - - - - - -
0.63 3150 0.0183 - - - - - - - - - - - - - -
0.64 3200 0.0178 - - - - - - - - - - - - - -
0.65 3250 0.0180 - - - - - - - - - - - - - -
0.66 3300 0.0171 - - - - - - - - - - - - - -
0.67 3350 0.0168 - - - - - - - - - - - - - -
0.68 3400 0.0168 - - - - - - - - - - - - - -
0.69 3450 0.0151 - - - - - - - - - - - - - -
0.7 3500 0.0177 0.6577 0.6343 0.3877 0.5599 - - - - - - - - - -
0.71 3550 0.0164 - - - - - - - - - - - - - -
0.72 3600 0.0165 - - - - - - - - - - - - - -
0.73 3650 0.0165 - - - - - - - - - - - - - -
0.74 3700 0.0162 - - - - - - - - - - - - - -
0.75 3750 0.0166 - - - - - - - - - - - - - -
0.76 3800 0.0163 - - - - - - - - - - - - - -
0.77 3850 0.0157 - - - - - - - - - - - - - -
0.78 3900 0.0182 - - - - - - - - - - - - - -
0.79 3950 0.0171 - - - - - - - - - - - - - -
0.8 4000 0.0170 0.6489 0.6356 0.4080 0.5642 - - - - - - - - - -
0.81 4050 0.0167 - - - - - - - - - - - - - -
0.82 4100 0.0152 - - - - - - - - - - - - - -
0.83 4150 0.0147 - - - - - - - - - - - - - -
0.84 4200 0.0165 - - - - - - - - - - - - - -
0.85 4250 0.0164 - - - - - - - - - - - - - -
0.86 4300 0.0157 - - - - - - - - - - - - - -
0.87 4350 0.0165 - - - - - - - - - - - - - -
0.88 4400 0.0154 - - - - - - - - - - - - - -
0.89 4450 0.0154 - - - - - - - - - - - - - -
0.9 4500 0.0162 0.6392 0.6391 0.4087 0.5623 - - - - - - - - - -
0.91 4550 0.0171 - - - - - - - - - - - - - -
0.92 4600 0.0159 - - - - - - - - - - - - - -
0.93 4650 0.0164 - - - - - - - - - - - - - -
0.94 4700 0.0157 - - - - - - - - - - - - - -
0.95 4750 0.0162 - - - - - - - - - - - - - -
0.96 4800 0.0154 - - - - - - - - - - - - - -
0.97 4850 0.0144 - - - - - - - - - - - - - -
0.98 4900 0.0158 - - - - - - - - - - - - - -
0.99 4950 0.0150 - - - - - - - - - - - - - -
1.0 5000 0.0156 0.6574 0.6341 0.4197 0.5704 - - - - - - - - - -
-1 -1 - 0.6574 0.6341 0.4197 0.5880 0.2320 0.6622 0.8399 0.8471 0.3145 0.9215 0.3355 0.4617 0.7462 0.5726
  • The bold row denotes the saved checkpoint.

Training Time

  • Training: 1.2 hours
  • Evaluation: 27.5 minutes
  • Total: 1.7 hours

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.6.0.dev0
  • Transformers: 5.8.0.dev0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0.dev0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Additional Resources

  • Sentence Transformers Documentation: the full documentation site, including training, evaluation, and pre-trained model catalogs.
  • PyLate: the upstream library whose features were absorbed into Sentence Transformers for multi-vector / late-interaction models.

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultiVectorDistillKLDivLoss

@inproceedings{santhanam-etal-2022-colbertv2,
    title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction",
    author = "Santhanam, Keshav and Khattab, Omar and Saad-Falcon, Jon and Potts, Christopher and Zaharia, Matei",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}
Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/multivector-ModernBERT-base-msmarco-kd

Finetuned
(1273)
this model

Dataset used to train tomaarsen/multivector-ModernBERT-base-msmarco-kd

Paper for tomaarsen/multivector-ModernBERT-base-msmarco-kd

Evaluation results