Instructions to use matulichpt/radlit-crossencoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use matulichpt/radlit-crossencoder with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("matulichpt/radlit-crossencoder") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
RadLITE-Reranker
Radiology Late Interaction Transformer Enhanced - Cross-Encoder Reranker
A domain-specialized cross-encoder for reranking radiology search results. This model takes a query-document pair and predicts a relevance score, providing more accurate ranking than bi-encoder similarity alone.
Recommended: Use this reranker together with RadLITE-Encoder in a two-stage pipeline for optimal performance. The bi-encoder handles fast retrieval over large corpora, then this cross-encoder reranks the top candidates for precision. This combination achieves MRR 0.829 on radiology retrieval benchmarks.
Model Description
| Property | Value |
|---|---|
| Model Type | Cross-Encoder (Reranker) |
| Base Model | ms-marco-MiniLM-L-12-v2 |
| Domain | Radiology / Medical Imaging |
| Hidden Size | 384 |
| Max Sequence Length | 512 tokens |
| Output | Single relevance score |
| License | Apache 2.0 |
Why Use a Reranker?
Bi-encoders (like RadLITE-Encoder) are fast but encode query and document independently. Cross-encoders process them together, capturing fine-grained interactions:
| Approach | Speed | Accuracy | Use Case |
|---|---|---|---|
| Bi-Encoder | Fast (1000s docs/sec) | Good | First-stage retrieval |
| Cross-Encoder | Slow (10s docs/sec) | Excellent | Reranking top candidates |
Two-stage pipeline: Use bi-encoder to get top 50-100 candidates, then rerank with cross-encoder for best results.
Performance
Impact on RadLIT-9 Benchmark
| Configuration | MRR | Improvement |
|---|---|---|
| Bi-Encoder only | 0.78 | baseline |
| Bi-Encoder + Reranker | 0.829 | +6.3% |
ABR Core Exam (Board-Style Questions)
Comparing two-stage pipeline (bi-encoder + reranker) vs bi-encoder alone:
| Dataset | Two-Stage MRR | Bi-Encoder Only | Improvement |
|---|---|---|---|
| Core Exam Chest | 0.533 | 0.409 | +30.3% |
| Core Exam Combined | 0.466 | 0.381 | +22.5% |
The reranker provides significant gains on complex, multi-part queries typical of board exam questions.
Published Benchmark Results
From Matulich & Mason, 2026:
| Benchmark | RadLIT Result | Key Finding |
|---|---|---|
| NFCorpus nDCG@10 | 0.268 | 17.9x improvement over RadBERT bi-encoder (0.015) |
| VQA-RAD MRR | 0.972 | Near-perfect retrieval on radiology Q&A |
| RadLIT-9 Thoracic | 0.736 nDCG@10 | Best-in-class (beat BGE-large, ColBERTv2) |
| RadLIT-9 Pediatric | 0.625 nDCG@10 | Best-in-class (beat BGE-large, ColBERTv2) |
| Zebra Test | 92% found rate | 2.1x improvement on rare conditions vs ColBERTv2 |
Vocabulary Alignment Hypothesis: Domain training provides measurable advantage when queries use radiology-specific terminology that aligns with the training domain.
Quick Start
Installation
pip install sentence-transformers>=2.2.0
Basic Usage
from sentence_transformers import CrossEncoder
# Load the reranker
reranker = CrossEncoder("matulichpt/radlit-crossencoder", max_length=512)
# Query and candidate documents
query = "What are the imaging features of hepatocellular carcinoma?"
documents = [
"HCC typically shows arterial enhancement with portal venous washout on CT.",
"Fatty liver disease presents as decreased attenuation on non-contrast CT.",
"Hepatic hemangiomas show peripheral nodular enhancement.",
]
# Create query-document pairs
pairs = [[query, doc] for doc in documents]
# Get relevance scores
scores = reranker.predict(pairs)
# Apply temperature calibration (RECOMMENDED)
calibrated_scores = scores / 1.5
print("Scores:", calibrated_scores)
# Document about HCC will have highest score
Temperature Calibration
Important: This model outputs scores with high variance. Apply temperature scaling for better fusion with other signals:
# Raw scores might be: [4.2, -1.5, 0.8]
# After calibration: [2.8, -1.0, 0.53]
TEMPERATURE = 1.5 # Recommended value
def calibrated_predict(reranker, pairs):
raw_scores = reranker.predict(pairs)
return raw_scores / TEMPERATURE
Full Two-Stage Search Pipeline
from sentence_transformers import SentenceTransformer, CrossEncoder
import numpy as np
class RadLITESearch:
def __init__(self, device="cuda"):
# Stage 1: Fast bi-encoder
self.encoder = SentenceTransformer(
"matulichpt/radlit-biencoder",
device=device
)
# Stage 2: Precise reranker
self.reranker = CrossEncoder(
"matulichpt/radlit-crossencoder",
max_length=512,
device=device
)
self.temperature = 1.5
self.corpus_embeddings = None
self.corpus = None
def index_corpus(self, documents: list):
"""Pre-compute embeddings for your corpus."""
self.corpus = documents
self.corpus_embeddings = self.encoder.encode(
documents,
normalize_embeddings=True,
show_progress_bar=True,
batch_size=32
)
def search(self, query: str, top_k: int = 10, candidates: int = 50):
"""Two-stage search: retrieve then rerank."""
# Stage 1: Bi-encoder retrieval
query_emb = self.encoder.encode(query, normalize_embeddings=True)
scores = query_emb @ self.corpus_embeddings.T
top_indices = np.argsort(scores)[-candidates:][::-1]
# Stage 2: Cross-encoder reranking
candidate_docs = [self.corpus[i] for i in top_indices]
pairs = [[query, doc] for doc in candidate_docs]
rerank_scores = self.reranker.predict(pairs) / self.temperature
# Sort by reranked scores
sorted_indices = np.argsort(rerank_scores)[::-1]
results = []
for idx in sorted_indices[:top_k]:
results.append({
"document": candidate_docs[idx],
"corpus_index": int(top_indices[idx]),
"score": float(rerank_scores[idx]),
"biencoder_score": float(scores[top_indices[idx]])
})
return results
# Usage
searcher = RadLITESearch()
searcher.index_corpus(your_radiology_documents)
results = searcher.search("pneumothorax CT findings")
Integration with Any Corpus
Radiopaedia / Educational Content
import json
# Load your content (e.g., Radiopaedia articles)
with open("radiopaedia_articles.json") as f:
articles = json.load(f)
corpus = [article["content"] for article in articles]
# Initialize search
searcher = RadLITESearch()
searcher.index_corpus(corpus)
# Search
results = searcher.search("classic findings of pulmonary embolism on CTPA")
for r in results[:5]:
print(f"Score: {r['score']:.3f}")
print(f"Content: {r['document'][:200]}...")
print()
Integration with Elasticsearch/OpenSearch
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("matulichpt/radlit-crossencoder", max_length=512)
def rerank_elasticsearch_results(query: str, es_results: list, top_k: int = 10):
"""Rerank Elasticsearch BM25 results."""
documents = [hit["_source"]["content"] for hit in es_results]
pairs = [[query, doc] for doc in documents]
scores = reranker.predict(pairs) / 1.5 # Temperature calibration
# Combine with ES scores (optional)
for i, hit in enumerate(es_results):
hit["rerank_score"] = float(scores[i])
hit["combined_score"] = 0.3 * hit["_score"] + 0.7 * scores[i]
# Sort by combined score
reranked = sorted(es_results, key=lambda x: x["combined_score"], reverse=True)
return reranked[:top_k]
Optimal Fusion Weights
When combining multiple signals (bi-encoder, cross-encoder, BM25), use these weights:
# Optimal weights from grid search on RadLIT-9
FUSION_WEIGHTS = {
"biencoder": 0.5, # RadLITE-Encoder similarity
"crossencoder": 0.2, # RadLITE-Reranker (after temp calibration)
"bm25": 0.3 # Lexical matching (if available)
}
def fused_score(bienc_score, ce_score, bm25_score=0):
return (
FUSION_WEIGHTS["biencoder"] * bienc_score +
FUSION_WEIGHTS["crossencoder"] * ce_score +
FUSION_WEIGHTS["bm25"] * bm25_score
)
Architecture
[Query] + [SEP] + [Document]
|
v
[BERT Tokenizer]
|
v
[MiniLM Encoder] (12 layers, 384 hidden)
|
v
[Classification Head]
|
v
Relevance Score (float)
Training Details
- Base Model: ms-marco-MiniLM-L-12-v2 (trained on MS MARCO passage ranking)
- Fine-tuning: Radiology query-document relevance pairs
- Training Steps: 5,626
- Best Validation Loss: 0.691
- Learning Rate: 2e-5
- Batch Size: 32
- Category Weighting: Yes (balanced across radiology subspecialties)
Best Practices
1. Always Use Temperature Calibration
Raw cross-encoder scores can be extreme. Temperature scaling (1.5) produces better fusion:
calibrated = raw_score / 1.5
2. Limit Candidates for Reranking
Cross-encoders are slow. Only rerank top 50-100 candidates from bi-encoder:
# Good: Rerank top 50
rerank_candidates = 50
# Bad: Rerank entire corpus
rerank_candidates = len(corpus) # Too slow!
3. Batch Predictions
# Efficient: Single batch call
pairs = [[query, doc] for doc in candidates]
scores = reranker.predict(pairs, batch_size=32)
# Inefficient: Individual calls
scores = [reranker.predict([[query, doc]])[0] for doc in candidates]
4. GPU Acceleration
reranker = CrossEncoder(
"matulichpt/radlit-crossencoder",
max_length=512,
device="cuda" # Use GPU
)
Limitations
- English only: Trained on English radiology text
- Speed: ~10-50 pairs/second (use for reranking, not full corpus)
- 512 token limit: Long documents are truncated
- Domain-specific: Optimized for radiology, may underperform on general medical content
Citation
If you use RadLITE in your work, please cite:
@article{matulich2026radlit,
title = {Late Interaction Retrieval Unlocks Domain Knowledge in Radiology Language Models},
author = {Matulich, Patrick and Mason, Dan},
year = {2026},
journal = {Radiology: Artificial Intelligence},
note = {17.9x improvement over RadBERT; best-in-class on Thoracic/Pediatric subspecialties},
url = {https://huggingface.co/matulichpt/radlit-biencoder}
}
Related Models
- RadLITE-Encoder - Bi-encoder for first-stage retrieval
- RadBERT-RoBERTa-4m - Base radiology language model
License
Apache 2.0 - Free for commercial and research use.
- Downloads last month
- 11
Model tree for matulichpt/radlit-crossencoder
Base model
microsoft/MiniLM-L12-H384-uncasedEvaluation results
- MRR (with bi-encoder) on RadLIT-9 (Radiology Retrieval Benchmark)self-reported0.829
- MRR on ABR Core Exam (Chest) on RadLIT-9 (Radiology Retrieval Benchmark)self-reported0.533