Paper Archive

Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video

0

5.0/10

Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi 1/8/2026 arxiv

machine learning

We propose Mesh4D, a feed-forward model for monocular 4D mesh reconstruction. Given a monocular video of a dynamic object, our model reconstructs the object's complete 3D shape and motion, represented as a deformation field. Our key contribution is a compact latent space that encodes the entire anim...

Keywords: attention, diffusion model

View Paper

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

0

5.0/10

Yuan-Kang Lee, Kuan-Lin Chen, Chia-Che Chang, Yu-Lun Liu 1/8/2026 arxiv

computer vision

Nighttime color constancy remains a challenging problem in computational photography due to low-light noise and complex illumination conditions. We present RL-AWB, a novel framework combining statistical methods with deep reinforcement learning for nighttime white balance. Our method begins with a s...

Keywords: reinforcement learning, detection

View Paper

QNeRF: Neural Radiance Fields on a Simulated Gate-Based Quantum Computer

0

5.0/10

Daniele Lizzio Bosco, Shuteng Wang, Giuseppe Serra, Vladislav Golyanik 1/8/2026 arxiv

computer vision

Recently, Quantum Visual Fields (QVFs) have shown promising improvements in model compactness and convergence speed for learning the provided 2D or 3D signals. Meanwhile, novel-view synthesis has seen major advances with Neural Radiance Fields (NeRFs), where models learn a compact representation fro...

Keywords: machine learning, computer vision

View Paper

LaST$_{0}$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model

0

5.0/10

Zhuoyang Liu, Jiaming Liu, Hao Chen, Ziyu Guo, Chengkai Hou, Chenyang Gu, Jiale Yu, Xiangju Mi, Renrui Zhang, Zhengping Che, Jian Tang, Pheng-Ann Heng, Shanghang Zhang 1/8/2026 arxiv

computer vision

Vision-Language-Action (VLA) models have recently demonstrated strong generalization capabilities in robotic manipulation. Some existing VLA approaches attempt to improve action accuracy by explicitly generating linguistic reasoning traces or future visual observations before action execution. Howev...

Keywords: transformer

View Paper

Pixel-Perfect Visual Geometry Estimation

0

5.0/10

Gangwei Xu, Haotong Lin, Hongcheng Luo, Haiyang Sun, Bing Wang, Guang Chen, Sida Peng, Hangjun Ye, Xin Yang 1/8/2026 arxiv

computer vision

Recovering clean and accurate geometry from images is essential for robotics and augmented reality. However, existing geometry foundation models still suffer severely from flying pixels and the loss of fine details. In this paper, we present pixel-perfect visual geometry models that can predict high...

Keywords: transformer

View Paper

Optimal Lower Bounds for Online Multicalibration

0

5.0/10

Natalie Collina, Jiuyao Lu, Georgy Noarov, Aaron Roth 1/8/2026 arxiv

natural language processing

We prove tight lower bounds for online multicalibration, establishing an information-theoretic separation from marginal calibration. In the general setting where group functions can depend on both context and the learner's predictions, we prove an $Ω(T^{2/3})$ lower bound on expected multicalibrat...

View Paper

GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation

0

5.0/10

Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Yu-Gang Jiang 1/8/2026 arxiv

computer vision

Referring Expression Segmentation (RES) and Comprehension (REC) respectively segment and detect the object described by an expression, while Referring Expression Generation (REG) generates an expression for the selected object. Existing datasets and methods commonly support single-target expressions...

Keywords: segmentation

View Paper

Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration

0

5.0/10

Xingyi He, Adhitya Polavaram, Yunhao Cao, Om Deshmukh, Tianrui Wang, Xiaowei Zhou, Kuan Fang 1/8/2026 arxiv

computer vision

Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manipulation, yet progress has been constrained by two persistent bottlenecks: the scarcity of large-scale datasets and the absence of integrated semantic and geometric reasoning in learned models....

View Paper

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

0

5.0/10

Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, Pavlo Molchanov 1/8/2026 arxiv

natural language processing

As language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL) pipelines have begun incorporating multiple rewards, each cap...

Keywords: reinforcement learning

View Paper

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

0

5.0/10

Boyang Wang, Haoran Zhang, Shujie Zhang, Jinkun Hao, Mingda Jia, Qi Lv, Yucheng Mao, Zhaoyang Lyu, Jia Zeng, Xudong Xu, Jiangmiao Pang 1/8/2026 arxiv

computer vision

The diversity, quantity, and quality of manipulation data are critical for training effective robot policies. However, due to hardware and physical setup constraints, collecting large-scale real-world manipulation data remains difficult to scale across diverse environments. Recent work uses text-pro...

Keywords: diffusion model

View Paper

Export Archive Data

Browse by Date

Papers for January 11, 2026

Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

QNeRF: Neural Radiance Fields on a Simulated Gate-Based Quantum Computer

LaST$_{0}$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model

Pixel-Perfect Visual Geometry Estimation

Optimal Lower Bounds for Online Multicalibration

GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation

Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation