Paper Archive

Browse and export your curated research paper collection

33
Archived Days
330
Total Papers
7.8
Avg Score
7
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for October 8, 2025

10 papers found

Yue Chen, Xingyu Chen, Yuxuan Xue, Anpei Chen, Yuliang Xiu, Gerard Pons-Moll 10/7/2025 arxiv

computer vision

We present Human3R, a unified, feed-forward framework for online 4D human-scene reconstruction, in the world frame, from casually captured monocular videos. Unlike previous approaches that rely on multi-stage pipelines, iterative contact-aware refinement between humans and scenes, and heavy dependen...

Keywords: Human3R, 4D reconstruction, SMPL-X, CUT3R, visual prompt tuning, BEDLAM, monocular video, real-time

Deheng Zhang, Yuqian Fu, Runyi Yang, Yang Miao, Tianwen Qian, Xu Zheng, Guolei Sun, Ajad Chhatkuli, Xuanjing Huang, Yu-Gang Jiang, Luc Van Gool, Danda Pani Paudel 10/7/2025 arxiv

computer vision

Most existing benchmarks for egocentric vision understanding focus primarily on daytime scenarios, overlooking the low-light conditions that are inevitable in real-world applications. To investigate this gap, we present EgoNight, the first comprehensive benchmark for nighttime egocentric vision, wit...

Keywords: egocentric vision, nighttime vision, visual question answering, dataset, day-night alignment, domain shift, multimodal LLMs, depth estimation

Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He 10/7/2025 arxiv

machine learning

Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplor...

Keywords: TaTToo, Process Reward Model, PRM, Test-Time Scaling, tabular reasoning, tool-grounded, reinforcement learning, supervised fine-tuning

Mert Kiray, Alican Karaomer, Benjamin Busam 10/7/2025 arxiv

computer vision

We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation net...

Keywords: RGB‑D SLAM, monocular depth estimation, instance segmentation, keypoint detection, metric scale, TUM RGB‑D, real‑time SLAM

Ayush Shrivastava, Connelly Barnes, Xuaner Zhang, Lingzhi Zhang, Andrew Owens, Sohrab Amirghodsi, Eli Shechtman 10/7/2025 arxiv

machine learning

Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image diffusion framework that leverages camera metadata, or EXIF data,...

Keywords: EXIF, defocus blur, lens blur, diffusion models, text-to-image, focus distance transformer, monocular depth, differentiable rendering

Mingkang Zhu, Xi Chen, Bei Yu, Hengshuang Zhao, Jiaya Jia 10/7/2025 arxiv

machine learning

Large language model (LLM) agents increasingly rely on external tools such as search engines to solve complex, multi-step problems, and reinforcement learning (RL) has become a key paradigm for training them. However, the trajectories of search agents are structurally heterogeneous, where variations...

Keywords: Stratified GRPO, Stratified Advantage Normalization, SAN, reinforcement learning, LLM agents, search agents, policy gradient, structural heterogeneity

Albert Catalan-Tatjer, Niccolò Ajroldi, Jonas Geiping 10/7/2025 arxiv

machine learning

While post-training quantization is widely adopted for efficient deployment of large language models, the mechanisms underlying quantization robustness remain unclear. We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B pa...

Keywords: post-training quantization, quantization robustness, learning rate decay, training dynamics, language models, model scale, hyperparameters, validation loss

Jiraphon Yenphraphai, Ashkan Mirzaei, Jianqi Chen, Jiaxu Zou, Sergey Tulyakov, Raymond A. Yeh, Peter Wonka, Chaoyang Wang 10/7/2025 arxiv

computer vision

Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our fr...

Keywords: 4D shape generation, video-to-3D, temporal attention, time-aware sampling, latent anchoring, non-rigid motion, topology changes, temporal consistency

Zefu Lin, Rongxu Cui, Chen Hanning, Xiangyu Wang, Junjia Xu, Xiaojuan Jin, Chen Wenbo, Hui Zhou, Lue Fan, Wenling Li, Zhaoxiang Zhang 10/7/2025 arxiv

machine learning

Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they ofte...

Keywords: EmbodiedCoder, coding model, mobile manipulation, training-free, interpretable control, parameterized geometry, robot trajectories, generalization

Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss 10/7/2025 arxiv

machine learning

Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that allow users to modulate the output with ease. However, determin...

Keywords: modulation discovery, DDSP, neural sound‑matching, control signal parameterization, LFO, envelope, interpretability, audio synthesis
Loading...

Preparing your export...