Paper Archive

MediX-R1: Open Ended Medical Reinforcement Learning

0

9.0/10

Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Anwer, Hisham Cholakkal 2/26/2026 arxiv

machine learning

We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a compos...

Keywords: MediX-R1, medical reinforcement learning, multimodal LLM, vision-language model, Group Based RL, composite reward, LLM-as-judge, medical embeddings

View Paper

VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

0

9.0/10

Sven Elflein, Ruilong Li, Sérgio Agostinho, Zan Gojcic, Laura Leal-Taixé, Qunjie Zhou, Aljosa Osep 2/26/2026 arxiv

computer vision

We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements grow quadratically w.r.t. the number of input images. Our approach is built on the key insight that this bottleneck stems from the varying-l...

Keywords: VGG-T3, test-time training, MLP distillation, Key-Value representation, 3D reconstruction, scalability, linear-time, softmax attention

View Paper

Model Agreement via Anchoring

0

9.0/10

Eric Eaton, Surbhi Goel, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell 2/26/2026 arxiv

machine learning

Numerous lines of aim to control $\textit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and standard notion of model disagreement in real-valued prediction problems, namely the expected squared difference in predictions betwe...

Keywords: model disagreement, anchoring, stacked aggregation, gradient boosting, neural architecture search, regression trees, theoretical bounds, squared loss

View Paper

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

0

9.0/10

Vaibhav Agrawal, Rishubh Parihar, Pradhaan Bhat, Ravi Kiran Sarvadevabhatla, R. Venkatesh Babu 2/26/2026 arxiv

computer vision

We identify occlusion reasoning as a fundamental yet overlooked aspect for 3D layout-conditioned generation. It is essential for synthesizing partially occluded objects with depth-consistent geometry and scale. While existing methods can generate realistic scenes that follow input layouts, they ofte...

Keywords: SeeThrough3D, occlusion reasoning, 3D layout, OSCR, occlusion-aware 3D scene representation, translucent 3D boxes, text-to-image, flow-based generative model

View Paper

A Dataset is Worth 1 MB

0

9.0/10

Elad Kimchi Shoshani, Leeyam Gabay, Yedid Hoshen 2/26/2026 arxiv

computer vision

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on diverse hardware and software frameworks, transmitting a pre-trained model is often infeasible; instead, agents require raw data to train their ow...

Keywords: PLADA, pseudo-labels, dataset serving, dataset distillation, communication-efficient ML, ImageNet, pruning, data efficiency

View Paper

SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

0

9.0/10

Simon Roschmann, Paul Krzakala, Sonia Mazelet, Quentin Bouniot, Zeynep Akata 2/26/2026 arxiv

machine learning

The Platonic Representation Hypothesis posits that neural networks trained on different modalities converge toward a shared statistical model of the world. Recent work exploits this convergence by aligning frozen pretrained vision and language models with lightweight alignment layers, but typically ...

Keywords: SOTAlign, semi-supervised, optimal transport, vision-language, unimodal encoders, alignment, contrastive learning

View Paper

Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

0

9.0/10

Amita Kamath, Jack Hessel, Khyathi Chandu, Jena D. Hwang, Kai-Wei Chang, Ranjay Krishna 2/26/2026 arxiv

machine learning

The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior stems from a reporting bias in their training data. That is, how people communicate about visual content by default omits tacit information needed to s...

Keywords: reporting bias, vision-language, VLM, pragmatics, spatial reasoning, temporal reasoning, negation, counting

View Paper

FlashOptim: Optimizers for Memory Efficient Training

0

9.0/10

Jose Javier Gonzalez Ortiz, Abhay Gupta, Chris Renard, Davis Blalock 2/26/2026 arxiv

machine learning

Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parameter itself, but also its gradient and one or more optimizer state variables. With each of these values typically requiring 4 bytes, training...

Keywords: FlashOptim, optimizer quantization, memory efficiency, companding, master weight splitting, AdamW, Lion, mixed-precision

View Paper

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

0

9.0/10

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu 2/26/2026 arxiv

machine learning

Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, wh...

Keywords: coarse data, mean estimation, Gaussian, convex partitions, identifiability, algorithms, computational complexity, NP-hard

View Paper

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

0

9.0/10

Tilemachos Aravanis, Vladan Stojnić, Bill Psomas, Nikos Komodakis, Giorgos Tolias 2/26/2026 arxiv

computer vision

Open-vocabulary segmentation (OVS) extends the zero-shot recognition capabilities of vision-language models (VLMs) to pixel-level prediction, enabling segmentation of arbitrary categories specified by text prompts. Despite recent progress, OVS lags behind fully supervised approaches due to two chall...

Keywords: open-vocabulary segmentation, few-shot, retrieval-augmented, test-time adapter, vision-language models, per-query fusion, personalized segmentation

View Paper

Export Archive Data

Browse by Date

Papers for February 28, 2026

MediX-R1: Open Ended Medical Reinforcement Learning

VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

Model Agreement via Anchoring

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

A Dataset is Worth 1 MB

SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

FlashOptim: Optimizers for Memory Efficient Training

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?