MEDVQA-GI 2026 Task 1 Submission: BM25 RAG-rank + LLaVA-1.5-7B (4-bit)

This repository contains a Hugging Face submission package for ImageCLEF MEDVQA-GI 2026, Task 1.

The submitted system is a BM25 Retrieval-Augmented Generation (RAG) ranking pipeline built around LLaVA-1.5-7B with 4-bit quantization.

Method

The system follows a retrieval-and-ranking approach rather than direct free-form generation:

Retrieve relevant evidence question-answer pairs from the training split using BM25.
Build a candidate answer pool from retrieved gold answers.
Use LLaVA-1.5-7B to score candidates conditioned on the image, question, and retrieved evidence.
Return the top-ranked candidate as the final prediction.

This corresponds to the final selected project variant:

Retriever: BM25
Mode: RAG-rank
Backbone: LLaVA-1.5-7B
Quantization: 4-bit

Repository structure

.
├── submission_task1.py
├── requirements.txt
├── index/
│   ├── bm25_index.pkl
│   └── evidence_index.pkl
└── medvqa_rag/
    ├── __init__.py
    ├── answer.py
    ├── bm25_retriever.py
    ├── evidence_index.py
    ├── llava_wrapper.py
    ├── normalise.py
    ├── pipeline.py
    ├── prediction.py
    ├── prompt.py
    └── schema.py

Main entrypoint

The competition entrypoint is:

submission_task1.py

This script:

loads the official validation split from SimulaMet/Kvasir-VQA-test,
loads the local BM25/evidence artifacts from index/,
runs the BM25 RAG-rank pipeline through medvqa_rag.answer,
writes predictions to predictions_1.json.

Local API

The repository exposes a minimal inference API through medvqa_rag.answer:

from medvqa_rag.answer import load, answer

load()
pred = answer(image=image, question=question, question_id=img_id)

Notes

Evidence retrieval uses only the training split.
The repository is structured for competition submission compatibility, not as a general-purpose model package.
The included pickle artifacts were rebuilt using the final medvqa_rag.* module paths to ensure compatibility during loading.

Running

Install dependencies:

pip install -r requirements.txt

Run the submission script:

python submission_task1.py

Acknowledgement

This repository was prepared for the AIR SS2026 course project on ImageCLEF MEDVQA-GI and packages the final BM25 RAG-rank submission variant in a competition-compatible Hugging Face repository format.

Dataset

This submission uses the Kvasir-VQA family of gastrointestinal endoscopy VQA datasets. The retrieval evidence was built from the training split of SimulaMet/Kvasir-VQA-x1.

References:

Kvasir-VQA paper: https://arxiv.org/abs/2409.01437
Hugging Face dataset page: https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

Dataset citation

This repository uses the Kvasir-VQA-x1 dataset. If you use this repository or the dataset in academic work, please cite the associated dataset paper.

Kvasir-VQA-x1 references

Springer chapter: https://doi.org/10.1007/978-3-032-08009-7_6
arXiv preprint: https://arxiv.org/abs/2506.09958
Hugging Face dataset page: https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

@incollection{Gautam2025Oct,
  author={Gautam, Sushant and Riegler, Michael and Halvorsen, P{\aa}l},
  title={Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},
  booktitle={Data Engineering in Medical Imaging},
  year={2025},
  publisher={Springer, Cham},
  doi={10.1007/978-3-032-08009-7_6}
}

@article{Gautam2025Jun,
  author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\aa}l},
  title = {Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},
  journal = {arXiv},
  year = {2025},
  month = jun,
  eprint = {2506.09958},
  doi = {10.48550/arXiv.2506.09958}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for anteyuh/medvqa-bm25-llava-rag

Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

Paper • 2506.09958 • Published Jun 11, 2025 • 1

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published Sep 2, 2024 • 71