Paper Archive

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

2

9.0/10

Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai 4/30/2026 arxiv

machine learning

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstr...

Keywords: HERMES++, driving world model, 3D scene understanding, future point cloud prediction, BEV, LLM-enhanced queries, current-to-future link, joint geometric optimization

View Paper

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

0

9.0/10

Junyoung Lee, Sookwan Han, Jeonghwan Kim, Inhee Lee, Mingi Choi, Jisoo Kim, Wonjung Woo, Hanbyul Joo 4/30/2026 arxiv

machine learning

Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains un...

Keywords: OmniRobotHome, multiadic, human-robot interaction, room-scale perception, occlusion-robust tracking, multi-camera, markerless tracking, Franka arms

View Paper

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

0

9.0/10

Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang 4/30/2026 arxiv

computer vision

Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and ...

Keywords: GenWildSplat, sparse-view, unposed images, 3D reconstruction, 3D Gaussians, appearance adapter, semantic segmentation, feed-forward

View Paper

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

0

9.0/10

Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng 4/30/2026 arxiv

robotics

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressi...

Keywords: LaST-R1, LAPO, latent Chain-of-Thought, adaptive latent CoT, Vision-Language-Action, reinforcement learning, imitation learning, robotic manipulation

View Paper

Representation Fréchet Loss for Visual Generation

0

9.0/10

Jiawei Yang, Zhengyang Geng, Xuan Ju, Yonglong Tian, Yue Wang 4/30/2026 arxiv

machine learning

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term...

Keywords: Fréchet Distance, FD-loss, FDr^k, FID, Inception features, One-step generator, ImageNet, Representation learning

View Paper

Computing Equilibrium beyond Unilateral Deviation

0

9.0/10

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar 4/30/2026 arxiv

machine learning

Most familiar equilibrium concepts, such as Nash and correlated equilibrium, guarantee only that no single player can improve their utility by deviating unilaterally. They offer no guarantees against profitable coordinated deviations by coalitions. Although the literature proposes solution concepts ...

Keywords: coalitional deviations, equilibrium, Nash equilibrium, strong Nash, coalition-proof equilibrium, exploitability welfare frontier, algorithmic game theory, complexity

View Paper

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

0

9.0/10

Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang 4/30/2026 arxiv

computer vision

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appea...

Keywords: visual generation, agentic, world modeling, flow matching, unified models, synthetic data distillation, reward modeling, evaluation

View Paper

Exploration Hacking: Can LLMs Learn to Resist RL Training?

0

9.0/10

Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner 4/30/2026 arxiv

machine learning

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration of diverse actions by the model during training, which creates a potential failure mode: a model cou...

Keywords: exploration hacking, reinforcement learning, large language models, model organisms, alignment, detection, mitigation, weight noising

View Paper

An adaptive wavelet-based PINN for problems with localized high-magnitude source

0

9.0/10

Himanshu Pandey, Ratikanta Behera 4/30/2026 arxiv

machine learning

In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer from two fundamental limitations, namely, spectral bias inherent in neural networks and loss imbalance arising from multiscale phenomena. This paper pr...

Keywords: AW-PINN, physics-informed neural networks, wavelets, localized sources, loss imbalance, spectral bias, Gaussian process, NTK

View Paper

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy

0

9.0/10

Andrea Dunn Beltran, Daniel Rho, Aarav Mehta, Xinqi Xiong, Raúl San José Estépar, Ron Alterovitz, Marc Niethammer, Roni Sengupta 4/30/2026 arxiv

machine learning

Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creating CT-to-body divergence that limits localization accuracy. In practice, this is mitigated through breath-hold protocols, which attempt to match the ...

Keywords: bronchoscopy, Gaussian splatting, respiratory modeling, CT registration, RESPIRE, deformation-aware reconstruction, medical imaging, real-time localization

View Paper

Export Archive Data

Browse by Date

Papers for May 2, 2026

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Representation Fréchet Loss for Visual Generation

Computing Equilibrium beyond Unilateral Deviation

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Exploration Hacking: Can LLMs Learn to Resist RL Training?

An adaptive wavelet-based PINN for problems with localized high-magnitude source

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy