Paper Archive

Human3R: Everyone Everywhere All at Once

0

9.0/10

Yue Chen, Xingyu Chen, Yuxuan Xue, Anpei Chen, Yuliang Xiu, Gerard Pons-Moll 10/7/2025 arxiv

computer vision

We present Human3R, a unified, feed-forward framework for online 4D human-scene reconstruction, in the world frame, from casually captured monocular videos. Unlike previous approaches that rely on multi-stage pipelines, iterative contact-aware refinement between humans and scenes, and heavy dependen...

Keywords: Human3R, 4D reconstruction, SMPL-X, CUT3R, visual prompt tuning, BEDLAM, monocular video, real-time

View Paper

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

0

9.0/10

Deheng Zhang, Yuqian Fu, Runyi Yang, Yang Miao, Tianwen Qian, Xu Zheng, Guolei Sun, Ajad Chhatkuli, Xuanjing Huang, Yu-Gang Jiang, Luc Van Gool, Danda Pani Paudel 10/7/2025 arxiv

computer vision

Most existing benchmarks for egocentric vision understanding focus primarily on daytime scenarios, overlooking the low-light conditions that are inevitable in real-world applications. To investigate this gap, we present EgoNight, the first comprehensive benchmark for nighttime egocentric vision, wit...

Keywords: egocentric vision, nighttime vision, visual question answering, dataset, day-night alignment, domain shift, multimodal LLMs, depth estimation

View Paper

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

0

9.0/10

Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He 10/7/2025 arxiv

machine learning

Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplor...

Keywords: TaTToo, Process Reward Model, PRM, Test-Time Scaling, tabular reasoning, tool-grounded, reinforcement learning, supervised fine-tuning

View Paper

Dropping the D: RGB-D SLAM Without the Depth Sensor

0

9.0/10

Mert Kiray, Alican Karaomer, Benjamin Busam 10/7/2025 arxiv

computer vision

We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation net...

Keywords: RGB‑D SLAM, monocular depth estimation, instance segmentation, keypoint detection, metric scale, TUM RGB‑D, real‑time SLAM

View Paper

Fine-grained Defocus Blur Control for Generative Image Models

0

9.0/10

Ayush Shrivastava, Connelly Barnes, Xuaner Zhang, Lingzhi Zhang, Andrew Owens, Sohrab Amirghodsi, Eli Shechtman 10/7/2025 arxiv

machine learning

Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image diffusion framework that leverages camera metadata, or EXIF data,...

Keywords: EXIF, defocus blur, lens blur, diffusion models, text-to-image, focus distance transformer, monocular depth, differentiable rendering

View Paper

Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents

0

9.0/10

Mingkang Zhu, Xi Chen, Bei Yu, Hengshuang Zhao, Jiaya Jia 10/7/2025 arxiv

machine learning

Large language model (LLM) agents increasingly rely on external tools such as search engines to solve complex, multi-step problems, and reinforcement learning (RL) has become a key paradigm for training them. However, the trajectories of search agents are structurally heterogeneous, where variations...

Keywords: Stratified GRPO, Stratified Advantage Normalization, SAN, reinforcement learning, LLM agents, search agents, policy gradient, structural heterogeneity

View Paper

Training Dynamics Impact Post-Training Quantization Robustness

0

9.0/10

Albert Catalan-Tatjer, Niccolò Ajroldi, Jonas Geiping 10/7/2025 arxiv

machine learning

While post-training quantization is widely adopted for efficient deployment of large language models, the mechanisms underlying quantization robustness remain unclear. We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B pa...

Keywords: post-training quantization, quantization robustness, learning rate decay, training dynamics, language models, model scale, hyperparameters, validation loss

View Paper

ShapeGen4D: Towards High Quality 4D Shape Generation from Videos

0

9.0/10

Jiraphon Yenphraphai, Ashkan Mirzaei, Jianqi Chen, Jiaxu Zou, Sergey Tulyakov, Raymond A. Yeh, Peter Wonka, Chaoyang Wang 10/7/2025 arxiv

computer vision

Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our fr...

Keywords: 4D shape generation, video-to-3D, temporal attention, time-aware sampling, latent anchoring, non-rigid motion, topology changes, temporal consistency

View Paper

EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model

0

9.0/10

Zefu Lin, Rongxu Cui, Chen Hanning, Xiangyu Wang, Junjia Xu, Xiaojuan Jin, Chen Wenbo, Hui Zhou, Lue Fan, Wenling Li, Zhaoxiang Zhang 10/7/2025 arxiv

machine learning

Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they ofte...

Keywords: EmbodiedCoder, coding model, mobile manipulation, training-free, interpretable control, parameterized geometry, robot trajectories, generalization

View Paper

Modulation Discovery with Differentiable Digital Signal Processing

0

9.0/10

Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss 10/7/2025 arxiv

machine learning

Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that allow users to modulate the output with ease. However, determin...

Keywords: modulation discovery, DDSP, neural sound‑matching, control signal parameterization, LFO, envelope, interpretability, audio synthesis

View Paper

Export Archive Data

Browse by Date

Papers for October 8, 2025

Human3R: Everyone Everywhere All at Once

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Dropping the D: RGB-D SLAM Without the Depth Sensor

Fine-grained Defocus Blur Control for Generative Image Models

Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents

Training Dynamics Impact Post-Training Quantization Robustness

ShapeGen4D: Towards High Quality 4D Shape Generation from Videos

EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model

Modulation Discovery with Differentiable Digital Signal Processing