Paper Archive

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

2

9.0/10

Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai 4/30/2026 arxiv

machine learning

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstr...

Keywords: HERMES++, driving world model, 3D scene understanding, future point cloud prediction, BEV, LLM-enhanced queries, geometric optimization, autonomous driving

View Paper

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

0

9.0/10

Junyoung Lee, Sookwan Han, Jeonghwan Kim, Inhee Lee, Mingi Choi, Jisoo Kim, Wonjung Woo, Hanbyul Joo 4/30/2026 arxiv

robotics

Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains un...

Keywords: OmniRobotHome, multiadic interaction, human-robot collaboration, room-scale perception, occlusion-robust tracking, multi-camera, Franka arms, markerless tracking

View Paper

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

0

9.0/10

Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang 4/30/2026 arxiv

machine learning

Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and ...

Keywords: GenWildSplat, sparse-view, 3D reconstruction, unposed images, feed-forward, 3D Gaussians, appearance adapter, semantic segmentation

View Paper

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

0

9.0/10

Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng 4/30/2026 arxiv

robotics

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressi...

Keywords: LaST-R1, LAPO, latent Chain-of-Thought, Vision-Language-Action, reinforcement learning, adaptive reasoning, physical dynamics, LIBERO

View Paper

Representation Fréchet Loss for Visual Generation

0

9.0/10

Jiawei Yang, Zhengyang Geng, Xuan Ju, Yonglong Tian, Yue Wang 4/30/2026 arxiv

generative models

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term...

Keywords: Fréchet Distance, FD-loss, FID, FDr^k, representation learning, generative models, ImageNet, one-step generator

View Paper

Computing Equilibrium beyond Unilateral Deviation

0

9.0/10

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar 4/30/2026 arxiv

machine learning

Most familiar equilibrium concepts, such as Nash and correlated equilibrium, guarantee only that no single player can improve their utility by deviating unilaterally. They offer no guarantees against profitable coordinated deviations by coalitions. Although the literature proposes solution concepts ...

Keywords: coalitional deviation, equilibrium, average-gain, maximum-gain, algorithmic game theory, exploitability welfare frontier, strong Nash, coalition-proof

View Paper

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

0

9.0/10

Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang 4/30/2026 arxiv

computer vision

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appea...

Keywords: visual generation, world modeling, agentic generation, five-level taxonomy, flow matching, unified models, reward modeling, synthetic data distillation

View Paper

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

0

9.0/10

Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao 4/30/2026 arxiv

machine learning

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synt...

Keywords: synthetic_data, simulation, agentic_systems, long_horizon_learning, productivity_AI, synthetic_environments

View Paper

An adaptive wavelet-based PINN for problems with localized high-magnitude source

0

9.0/10

Himanshu Pandey, Ratikanta Behera 4/30/2026 arxiv

machine learning

In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer from two fundamental limitations, namely, spectral bias inherent in neural networks and loss imbalance arising from multiscale phenomena. This paper pr...

Keywords: AW-PINN, wavelets, physics-informed neural networks, spectral bias, loss imbalance, NTK, Gaussian process limit, localized source

View Paper

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy

0

9.0/10

Andrea Dunn Beltran, Daniel Rho, Aarav Mehta, Xinqi Xiong, Raúl San José Estépar, Ron Alterovitz, Marc Niethammer, Roni Sengupta 4/30/2026 arxiv

machine learning

Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creating CT-to-body divergence that limits localization accuracy. In practice, this is mitigated through breath-hold protocols, which attempt to match the ...

Keywords: Gaussian splatting, bronchoscopy, CT-to-body divergence, respiratory modeling, RESPIRE, endoscopic navigation, mesh-anchored reconstruction, breathing phase estimation

View Paper

Export Archive Data

Browse by Date

Papers for May 3, 2026

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Representation Fréchet Loss for Visual Generation

Computing Equilibrium beyond Unilateral Deviation

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

An adaptive wavelet-based PINN for problems with localized high-magnitude source

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy