Paper Archive

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

0

9.0/10

Nils Morbitzer, Jonathan Evers, Artem Savkin, Thomas Stauner, Nassir Navab, Federico Tombari, Stefano Gasperini 6/16/2026 arxiv

computer vision

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morp...

Keywords: 3D reconstruction, dynamic environments, disentangled ego-motion, future prediction, monocular vision, AI, computer vision

View Paper

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

0

9.0/10

Wujian Peng, Lingchen Meng, Yuxuan Cai, Xianwei Zhuang, Yuhuan Yang, Rongyao Fang, Chenfei Wu, Junyang Lin, Zuxuan Wu, Shuai Bai 6/16/2026 arxiv

computer vision

Unified Multimodal Modeling aims to integrate visual understanding and generation within a single system. However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinders truly unified modeling. We propose UniAR, a unified autoregressi...

Keywords: UniAR, multimodal modeling, autoregressive framework, visual tokenizer, image generation, image editing

View Paper

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

0

9.0/10

Mingtong Zhang, Dhruv Shah 6/16/2026 arxiv

robotics

Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for generalist robot policies for inference-time policy steering and self-...

Keywords: robotics, machine learning, policy steering, self-improvement, inference-time verification

View Paper

Variable-Width Transformers

0

9.0/10

Zhaofeng Wu, Oliver Sieberling, Shawn Tan, Rameswar Panda, Yury Polyanskiy, Yoon Kim 6/16/2026 arxiv

machine learning

Scaling model size, specifically depth and width, has driven significant progress in transformer-based language models. However, most architectures maintain a constant width across all layers, allocating a fixed parameter and computation budget evenly despite different layers potentially playing dis...

Keywords: transformers, language models, nonuniform width allocation, resource optimization, FLOPs reduction

View Paper

MOCHI: Motion Enhancement of Collaborative Human-object Interactions

0

9.0/10

Jiye Lee, Yonghun Choi, Jungdam Won 6/16/2026 arxiv

computer vision

Collaborative human-object interaction shows dynamic and complex movements that require mutual anticipation and continuous adjustment between participants and the shared object. Modeling such collaborative multi-human object interaction (MHOI) scenarios requires high-quality data acquisition as a fo...

Keywords: Motion Enhancement, Collaborative Human-object Interactions, MHOI, Optimization, Diffusion Models

View Paper

EventDrive: Event Cameras for Vision-Language Driving Intelligence

0

9.0/10

Dongyue Lu, Rong Li, Ao Liang, Lingdong Kong, Wei Yin, Lai Xing Ng, Benoit R. Cottereau, Camille Simon Chane, Wei Tsang Ooi 6/16/2026 arxiv

computer vision

Event cameras sense the world through asynchronous brightness changes with microsecond latency and high dynamic range, offering motion fidelity far beyond frame-based sensors and capturing temporal structure that conventional exposures often miss. These properties make events a powerful complement t...

Keywords: event cameras, vision-language driving intelligence, autonomous vehicles, event-driven models, temporal-horizon mixture-of-experts

View Paper

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

0

9.0/10

Ning Gao, Jinliang Zheng, Xing Gao, Haoxiang Ma, Hanqing Wang, Yukai Wang, Jiantong Chen, Zanxin Chen, Shujie Zhang, Mingda Jia, Xuekun Jiang, Zihou Zhu, Xinyu Li, Shuai Wang, Hao Li, Wenzhe Cai, Yuqiang Yang, Xudong Xu, Zhaoyang Lyu, Yao Mu, Tai Wang, Jiangmiao Pang, Jia Zeng, Weinan Zhang, Chunhua Shen 6/16/2026 arxiv

computer vision

We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art g...

Keywords: EBench, mobile manipulation, simulation benchmark, generalist policies, capability profiling, generalization

View Paper

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

0

9.0/10

Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah, Tim Dettmers, Yiming Yang, Ameet Talwalkar 6/16/2026 arxiv

machine learning

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation...

Keywords: Reproducibility, GitHub, LLM, Machine Learning, ReproRepo, Reproducibility Auditing

View Paper

EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation

0

9.0/10

Qi Chai, Wenhao Shen, Nanjie Yao, Yue Xia, Kaiyong Zhao, Jie Ma, Guosheng Lin, Hao Wang 6/16/2026 arxiv

computer vision

Zero-Shot Object-Goal Navigation (ZS-OGN) requires embodied agents to explore and locate target objects without any prior training. To this end, recent methods leverage foundation models. But they typically rely on static priors and lack adaptation, which leads to repeated errors and costly trial an...

Keywords: Zero-Shot Object Goal Navigation, self-evolving memory, preflection module, upper confidence bound, agentic rule memory

View Paper

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

0

9.0/10

Rishit Dagli, Donglai Xiang, Vismay Modi, Xuning Yang, Gavriel State, David I. W. Levin, Maria Shugrina 6/16/2026 arxiv

computer vision

Accurate mechanical properties (or materials) Young's modulus ($E$), Poisson's ratio ($ν$) and density ($ρ$) are essential for reliable physics simulation of digital worlds, but most 3D assets lack this information. We propose AdaVoMP, a method for predicting accurate dense spatially-varying ($E$, $...

Keywords: AdaVoMP, volumetric properties, 3D objects, material prediction, sparse transformer, resolution, accuracy

View Paper

Export Archive Data

Browse by Date

Papers for June 17, 2026

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

Variable-Width Transformers

MOCHI: Motion Enhancement of Collaborative Human-object Interactions

EventDrive: Event Cameras for Vision-Language Driving Intelligence

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution