Paper Archive

Tokenisation via Convex Relaxations

0

5.0/10

Jan Tempus, Philip Whittington, Craig W. Schmidt, Dennis Komm, Tiago Pimentel 5/21/2026 arxiv

natural language processing

Tokenisation is an integral part of the current NLP pipeline. Current tokenisation algorithms such as BPE and Unigram are greedy algorithms -- they make locally optimal decisions without considering the resulting vocabulary as a whole. We instead formulate tokeniser construction as a linear program ...

View Paper

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

0

5.0/10

Jongseo Lee, Hyuntak Lee, Sunghun Kim, Sooa Kim, Jihoon Chung, Jinwoo Choi 5/21/2026 arxiv

computer vision

Video Large Language Models (Video-LLMs) have made rapid progress on temporal video understanding, yet many fail at a basic perceptual primitive: signed image-plane motion direction. On simple videos of a single object moving left, right, up, or down, most Video-LLMs perform near chance, with above-...

View Paper

Integrable Elasticity via Neural Demand Potentials

0

5.0/10

Carlos Heredia, Daniel Roncel 5/21/2026 arxiv

natural language processing

We propose the Integrable Context-Dependent Demand Network (ICDN), a demand-first neural model for multiproduct retail demand. The model learns log-demand as a smooth, context-conditioned function of log-prices, allowing elasticities to be derived exactly from the learned demand surface. On the Domi...

View Paper

Cambrian-P: Pose-Grounded Video Understanding

0

5.0/10

Jihan Yang, Zifan Zhao, Xichen Pan, Shusheng Yang, Junyi Zhang, Bingyi Kang, Hu Xu, Saining Xie 5/21/2026 arxiv

reinforcement learning

Camera pose matters. The position and orientation of each viewpoint define a shared spatial coordinate frame that relates observations across video frames. Yet this signal is largely absent from multimodal LLMs (MLLMs) for video understanding, which process frames as isolated 2D snapshots, instead o...

Keywords: regression

View Paper

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

0

5.0/10

Lee Hsin-Ying, Hanwen Jiang, Yiqun Mei, Jing Shi, Ming-Hsuan Yang, Zhixin Shu 5/21/2026 arxiv

computer vision

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we intro...

View Paper

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

0

5.0/10

Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld, Akarsh Kumar, Mehul Damani, Sebastian Risi, Omar Khattab, Zhang-Wei Hong, Pulkit Agrawal 5/21/2026 arxiv

natural language processing

Language models must now generalize out of the box to novel environments and work inside inference-scaling search procedures, such as AlphaEvolve, that select rollouts with a variety of task-specific reward functions. Unfortunately, the standard paradigm of LLM post-training optimizes a pre-specifie...

View Paper

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

0

5.0/10

Wenxuan Guo, Xiuwei Xu, Yichen Liu, Xiangyu Li, Hang Yin, Huangxing Chen, Wenzhao Zheng, Jianjiang Feng, Jie Zhou, Jiwen Lu 5/21/2026 arxiv

computer vision

Vision-and-Language Navigation (VLN) requires an agent to ground language instructions to its own movement within a visual environment. While state-of-the-art methods leverage the reasoning capabilities of Vision-Language Models (VLMs) for end-to-end action prediction, they often lack an explicit an...

View Paper

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

0

5.0/10

Lily Goli, Justin Kerr, Daniele Reda, Alec Jacobson, Andrea Tagliasacchi, Angjoo Kanazawa 5/21/2026 arxiv

computer vision

Exploration is a prerequisite for learning useful behaviors in sparse-reward, long-horizon tasks, particularly within 3D environments. Curiosity-driven reinforcement learning addresses this via intrinsic rewards derived from the mismatch between the agent's predictive model of the world and reality....

Keywords: reinforcement learning

View Paper

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations

0

5.0/10

Wenxuan Guo, Ziyuan Li, Meng Zhang, Yichen Liu, Yimeng Dong, Chuxi Xu, Yunfei Wei, Ze Chen, Erjin Zhou, Jianjiang Feng 5/21/2026 arxiv

computer vision

Vision-Language-Action (VLA) models have shown strong potential for general-purpose robot manipulation by unifying perception and action. However, existing VLA systems primarily rely on textual instructions and struggle to resolve spatial ambiguity in complex scenes with multiple similar objects. To...

View Paper

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

0

5.0/10

Jiahao Wang, Bo Sun, Yijing Bai, Vincent Casser, Songyou Peng, Zehao Zhu, Meng-Li Shih, Xander Masotto, Shih-Yang Su, Kanaad V Parvate, Tiancheng Ge, Linn Bieske, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang 5/21/2026 arxiv

computer vision

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. ...

Keywords: multi-modal

View Paper

Export Archive Data

Browse by Date

Papers for May 23, 2026

Tokenisation via Convex Relaxations

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Integrable Elasticity via Neural Demand Potentials

Cambrian-P: Pose-Grounded Video Understanding

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving