MEDVQA-GI 2026 Task 1 Submission: BM25 RAG-rank + LLaVA-1.5-7B (4-bit)

This repository contains a Hugging Face submission package for ImageCLEF MEDVQA-GI 2026, Task 1.

The submitted system is a BM25 Retrieval-Augmented Generation (RAG) ranking pipeline built around LLaVA-1.5-7B with 4-bit quantization.

Method

The system follows a retrieval-and-ranking approach rather than direct free-form generation:

  1. Retrieve relevant evidence question-answer pairs from the training split using BM25.
  2. Build a candidate answer pool from retrieved gold answers.
  3. Use LLaVA-1.5-7B to score candidates conditioned on the image, question, and retrieved evidence.
  4. Return the top-ranked candidate as the final prediction.

This corresponds to the final selected project variant:

  • Retriever: BM25
  • Mode: RAG-rank
  • Backbone: LLaVA-1.5-7B
  • Quantization: 4-bit

Repository structure

.
β”œβ”€β”€ submission_task1.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ index/
β”‚   β”œβ”€β”€ bm25_index.pkl
β”‚   └── evidence_index.pkl
└── medvqa_rag/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ answer.py
    β”œβ”€β”€ bm25_retriever.py
    β”œβ”€β”€ evidence_index.py
    β”œβ”€β”€ llava_wrapper.py
    β”œβ”€β”€ normalise.py
    β”œβ”€β”€ pipeline.py
    β”œβ”€β”€ prediction.py
    β”œβ”€β”€ prompt.py
    └── schema.py

Main entrypoint

The competition entrypoint is:

  • submission_task1.py

This script:

  • loads the official validation split from SimulaMet/Kvasir-VQA-test,
  • loads the local BM25/evidence artifacts from index/,
  • runs the BM25 RAG-rank pipeline through medvqa_rag.answer,
  • writes predictions to predictions_1.json.

Local API

The repository exposes a minimal inference API through medvqa_rag.answer:

from medvqa_rag.answer import load, answer

load()
pred = answer(image=image, question=question, question_id=img_id)

Notes

  • Evidence retrieval uses only the training split.
  • The repository is structured for competition submission compatibility, not as a general-purpose model package.
  • The included pickle artifacts were rebuilt using the final medvqa_rag.* module paths to ensure compatibility during loading.

Running

Install dependencies:

pip install -r requirements.txt

Run the submission script:

python submission_task1.py

Acknowledgement

This repository was prepared for the AIR SS2026 course project on ImageCLEF MEDVQA-GI and packages the final BM25 RAG-rank submission variant in a competition-compatible Hugging Face repository format.

Dataset

This submission uses the Kvasir-VQA family of gastrointestinal endoscopy VQA datasets. The retrieval evidence was built from the training split of SimulaMet/Kvasir-VQA-x1.

References:

Dataset citation

This repository uses the Kvasir-VQA-x1 dataset. If you use this repository or the dataset in academic work, please cite the associated dataset paper.

Kvasir-VQA-x1 references

@incollection{Gautam2025Oct,
  author={Gautam, Sushant and Riegler, Michael and Halvorsen, P{\aa}l},
  title={Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},
  booktitle={Data Engineering in Medical Imaging},
  year={2025},
  publisher={Springer, Cham},
  doi={10.1007/978-3-032-08009-7_6}
}

@article{Gautam2025Jun,
  author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\aa}l},
  title = {Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},
  journal = {arXiv},
  year = {2025},
  month = jun,
  eprint = {2506.09958},
  doi = {10.48550/arXiv.2506.09958}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for anteyuh/medvqa-bm25-llava-rag