Paper Archive

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

0

5.0/10

Ruozhen He, Meng Wei, Ziyan Yang, Vicente Ordonez 5/14/2026 arxiv

computer vision

Multi-shot video generation extends single-shot generation to coherent visual narratives, yet maintaining consistent characters, objects, and locations across shots remains a challenge over long sequences. Existing evaluations typically use independently generated prompt sets with limited entity cov...

View Paper

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

0

5.0/10

Ziyu Guo, Rain Liu, Xinyan Chen, Pheng-Ann Heng 5/14/2026 arxiv

computer vision

Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alterna...

View Paper

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

0

5.0/10

Xiang Fan, Yuheng Wang, Bohan Fang, Zhongzheng Ren, Ranjay Krishna 5/14/2026 arxiv

computer vision

Video generation powers a vast array of downstream applications. However, while the de facto standard, i.e., latent diffusion models, typically employ heavily conditioned denoising networks, their decoders often remain unconditional. We observe that this architectural asymmetry leads to significant ...

Keywords: attention, diffusion model, fine-tuning

View Paper

VGGT-$Ω$

0

5.0/10

Jianyuan Wang, Minghao Chen, Shangzhan Zhang, Nikita Karaev, Johannes Schönberger, Patrick Labatut, Piotr Bojanowski, David Novotny, Andrea Vedaldi, Christian Rupprecht 5/14/2026 arxiv

computer vision

Recent feed-forward reconstruction models, such as VGGT, have proven competitive with traditional optimization-based reconstructors while also providing geometry-aware features useful for other tasks. Here, we show that the quality of these models scales predictably with model and data size. We do s...

Keywords: attention

View Paper

Aligning Latent Geometry for Spherical Flow Matching in Image Generation

0

5.0/10

Tuna Han Salih Meral, Kaan Oktay, Hidir Yesiltepe, Adil Kaan Akan, Pinar Yanardag 5/14/2026 arxiv

computer vision

Latent flow matching for image generation usually transports Gaussian noise to variational autoencoder latents along linear paths. Both endpoints, however, concentrate in thin spherical shells, and a Euclidean chord leaves those shells even when preprocessing aligns their radii. By decomposing each ...

View Paper

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

0

5.0/10

Yanzuo Lu, Ronglai Zuo, Jiankang Deng 5/14/2026 arxiv

reinforcement learning

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history dist...

Keywords: attention, diffusion model, reinforcement learning

View Paper

FutureSim: Replaying World Events to Evaluate Adaptive Agents

0

5.0/10

Shashwat Goel, Nikhil Chandak, Arvindh Arun, Ameya Prabhu, Steffen Staab, Moritz Hardt, Maksym Andriushchenko, Jonas Geiping 5/14/2026 arxiv

reinforcement learning

AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We ...

View Paper

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

0

5.0/10

Matt Zhou, Ruining Li, Xiaoyang Lyu, Zhaomou Song, Zhening Huang, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi, Shangzhe Wu 5/14/2026 arxiv

natural language processing

A bottleneck in learning to understand articulated 3D objects is the lack of large and diverse datasets. In this paper, we propose to leverage large language models (LLMs) to close this gap and generate articulated assets at scale. We reduce the problem of generating an articulated 3D asset to that ...

View Paper

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

0

5.0/10

Kaixin Zhu, Yiwen Tang, Yifan Yang, Renrui Zhang, Bohan Zeng, Ziyu Guo, Ruichuan An, Zhou Liu, Qizhi Chen, Delin Qu, Jaehong Yoon, Wentao Zhang 5/14/2026 arxiv

natural language processing

High-quality 3D scene reconstruction has recently advanced toward generalizable feed-forward architectures, enabling the generation of complex environments in a single forward pass. However, despite their strong performance in static scene perception, these models remain limited in responding to dyn...

View Paper

Quantitative Video World Model Evaluation for Geometric-Consistency

0

5.0/10

Jiaxin Wu, Yihao Pi, Yinling Zhang, Yuheng Li, Xueyan Zou 5/14/2026 arxiv

reinforcement learning

Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and wea...

Keywords: segmentation

View Paper

Export Archive Data

Browse by Date

Papers for May 16, 2026

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

VGGT-$Ω$

Aligning Latent Geometry for Spherical Flow Matching in Image Generation

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

FutureSim: Replaying World Events to Evaluate Adaptive Agents

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Quantitative Video World Model Evaluation for Geometric-Consistency