arxiv:2606.19088

ReSiReg: Towards Spatially Consistent Semantics in Language-Conditioned Robotic Tasks

Published on Jun 17

Authors:

Abstract

ReSiReg improves spatial consistency in vision-language models for robotics by reconstructing dense embeddings through clustered visual prototypes and language descriptors, enabling better 3D mapping and manipulation tasks.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Vision-Language Models (VLMs) enable robots to follow open-language instructions. However, dense VLM embeddings have shown to be noisy and lack spatial consistency. This is problematic for robotic applications, which require simultaneous reasoning over semantics and 3D space. We examine spatial structure across recent VLMs and propose ReSiReg, a feature reconstruction method that uses spatially consistent VLM intermediates to improve dense language-grounded retrieval. ReSiReg clusters intermediates into visual prototypes, derives their language descriptors, and reconstructs each patch as a soft mixture of prototype-level language embeddings. We evaluate quantitatively on OVSS and 3D mapping across backbones, and qualitatively in real-world manipulation scenes. Quantitative results show improved dense retrieval; manipulation scenes show more spatially consistent target activations. We further provide a compact 25M dense VLM for robotic applications, substantially smaller than and competitive with ViT-B baselines. Available at https://resireg.github.io

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.19088

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19088 in a dataset README.md to link it from this page.

ReSiReg: Towards Spatially Consistent Semantics in Language-Conditioned Robotic Tasks

Abstract

Community

Models citing this paper 1

Datasets citing this paper 0

Spaces citing this paper 1

Collections including this paper 1