Instructions to use anteyuh/medvqa-bm25-llava-rag with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use anteyuh/medvqa-bm25-llava-rag with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anteyuh/medvqa-bm25-llava-rag", dtype="auto") - Notebooks
- Google Colab
- Kaggle
MEDVQA-GI 2026 Task 1 Submission: BM25 RAG-rank + LLaVA-1.5-7B (4-bit)
This repository contains a Hugging Face submission package for ImageCLEF MEDVQA-GI 2026, Task 1.
The submitted system is a BM25 Retrieval-Augmented Generation (RAG) ranking pipeline built around LLaVA-1.5-7B with 4-bit quantization.
Method
The system follows a retrieval-and-ranking approach rather than direct free-form generation:
- Retrieve relevant evidence question-answer pairs from the training split using BM25.
- Build a candidate answer pool from retrieved gold answers.
- Use LLaVA-1.5-7B to score candidates conditioned on the image, question, and retrieved evidence.
- Return the top-ranked candidate as the final prediction.
This corresponds to the final selected project variant:
- Retriever: BM25
- Mode: RAG-rank
- Backbone: LLaVA-1.5-7B
- Quantization: 4-bit
Repository structure
.
βββ submission_task1.py
βββ requirements.txt
βββ index/
β βββ bm25_index.pkl
β βββ evidence_index.pkl
βββ medvqa_rag/
βββ __init__.py
βββ answer.py
βββ bm25_retriever.py
βββ evidence_index.py
βββ llava_wrapper.py
βββ normalise.py
βββ pipeline.py
βββ prediction.py
βββ prompt.py
βββ schema.py
Main entrypoint
The competition entrypoint is:
submission_task1.py
This script:
- loads the official validation split from
SimulaMet/Kvasir-VQA-test, - loads the local BM25/evidence artifacts from
index/, - runs the BM25 RAG-rank pipeline through
medvqa_rag.answer, - writes predictions to
predictions_1.json.
Local API
The repository exposes a minimal inference API through medvqa_rag.answer:
from medvqa_rag.answer import load, answer
load()
pred = answer(image=image, question=question, question_id=img_id)
Notes
- Evidence retrieval uses only the training split.
- The repository is structured for competition submission compatibility, not as a general-purpose model package.
- The included pickle artifacts were rebuilt using the final
medvqa_rag.*module paths to ensure compatibility during loading.
Running
Install dependencies:
pip install -r requirements.txt
Run the submission script:
python submission_task1.py
Acknowledgement
This repository was prepared for the AIR SS2026 course project on ImageCLEF MEDVQA-GI and packages the final BM25 RAG-rank submission variant in a competition-compatible Hugging Face repository format.
Dataset
This submission uses the Kvasir-VQA family of gastrointestinal endoscopy VQA datasets.
The retrieval evidence was built from the training split of SimulaMet/Kvasir-VQA-x1.
References:
- Kvasir-VQA paper: https://arxiv.org/abs/2409.01437
- Hugging Face dataset page: https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1
Dataset citation
This repository uses the Kvasir-VQA-x1 dataset. If you use this repository or the dataset in academic work, please cite the associated dataset paper.
Kvasir-VQA-x1 references
- Springer chapter: https://doi.org/10.1007/978-3-032-08009-7_6
- arXiv preprint: https://arxiv.org/abs/2506.09958
- Hugging Face dataset page: https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1
@incollection{Gautam2025Oct,
author={Gautam, Sushant and Riegler, Michael and Halvorsen, P{\aa}l},
title={Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},
booktitle={Data Engineering in Medical Imaging},
year={2025},
publisher={Springer, Cham},
doi={10.1007/978-3-032-08009-7_6}
}
@article{Gautam2025Jun,
author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\aa}l},
title = {Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},
journal = {arXiv},
year = {2025},
month = jun,
eprint = {2506.09958},
doi = {10.48550/arXiv.2506.09958}
}