VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14 • 112
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs Paper • 2511.14159 • Published 29 days ago • 24
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published about 1 month ago • 25
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published 29 days ago • 17