Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for April 3, 2026

10 papers found

Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Guillermo Gallego 4/2/2026 arxiv

computer vision

We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis t...

Keywords: event-based vision, stereo, novel view synthesis, proxy events, data distillation, domain generalization, RGB-to-event

Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov, Fabio Pizzati, Aliaksandr Siarohin 4/2/2026 arxiv

generative models

Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental...

Keywords: video diffusion, world models, action binding, subject state tokens, spatial biasing, multi-agent, Melting Pot, generative video games

Zheng-Hui Huang, Zhixiang Wang, Jiaming Tan, Ruihan Yu, Yidan Zhang, Bo Zheng, Yu-Lun Liu, Yung-Yu Chuang, Kaipeng Zhang 4/2/2026 arxiv

computer vision

Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a no...

Keywords: generative rendering, inverse rendering, G-buffer, dataset, AAA games, dual-screen capture, VLM evaluation, temporal coherence

Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano 4/2/2026 arxiv

computer vision

Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way ...

Keywords: steerable representations, vision transformers, early fusion, cross-attention, DINOv2, MAE, CLIP, zero-shot

Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak 4/2/2026 arxiv

machine learning

Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tu...

Keywords: Grounded Token Initialization, GTI, token initialization, vocabulary extension, language models, generative recommendation, embedding grounding, semantic-ID tokens

Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez 4/2/2026 arxiv

machine learning

Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explore a complementary and more challenging setting of scenario-based visual grounding, where the targ...

Keywords: scenario-based grounding, Referring Scenario Comprehension, visual grounding, curriculum learning, reinforcement learning, vision-and-language, dataset

Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu 4/2/2026 arxiv

machine learning

Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning qua...

Keywords: Batched Contextual Reinforcement, Chain-of-Thought, token efficiency, task-scaling law, implicit budget, inference cost, emergent efficiency, length control

Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento, Yichen Xu, He Wen, Teng Deng, Wyatt Borsos, Anjali Thakrar, Jean-Charles Bazin, Carsten Stoll, Ginés Hidalgo, James Booth, Lucy Wang, Xiaowen Ma, Yu Rong, Sairanjith Thalanki, Chen Cao, Christian Häne, Abhishek Kar, Sofien Bouaziz, Jason Saragih, Yaser Sheikh, Shunsuke Saito 4/2/2026 arxiv

computer vision

High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and...

Keywords: 3D avatars, pretraining, in-the-wild, multi-view, avatar modeling, feedforward inference, relightability, codec avatars

Xueying Li, Feng Lyu, Hao Wu, Mingliu Liu, Jia-Nan Liu, Guozi Liu 4/2/2026 arxiv

machine learning

Training-free Vision-Language Navigation (VLN) agents powered by foundation models can follow instructions and explore 3D environments. However, existing approaches rely on greedy frontier selection and passive spatial memory, leading to inefficient behaviors such as local oscillation and redundant ...

Keywords: MetaNav, metacognition, vision-language navigation, 3D semantic map, LLM-guided correction, history-aware planning, frontier selection, GOAT-Bench

Yujiao Shen, Shulin Tian, Jingkang Yang, Ziwei Liu 4/2/2026 arxiv

computer vision

Recent streaming video understanding methods increasingly rely on complex memory mechanisms to handle long video streams. We challenge this trend with a simple finding: a sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM already matches or surpasses published s...

Keywords: SimpleStream, sliding-window, VLM, streaming video understanding, OVO-Bench, StreamingBench, perception-memory trade-off, backbone-dependent context
Loading...

Preparing your export...