Paper Archive

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

0

5.0/10

[object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

computer vision

Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alterna...

View Paper

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

computer vision

Video generation powers a vast array of downstream applications. However, while the de facto standard, i.e., latent diffusion models, typically employ heavily conditioned denoising networks, their decoders often remain unconditional. We observe that this architectural asymmetry leads to significant ...

Keywords: attention, diffusion model, fine-tuning

View Paper

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

0

5.0/10

[object Object], [object Object], [object Object] 5/14/2026 huggingface

reinforcement learning

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history dist...

Keywords: attention, diffusion model, reinforcement learning

View Paper

FutureSim: Replaying World Events to Evaluate Adaptive Agents

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

reinforcement learning

AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We ...

View Paper

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

natural language processing

High-quality 3D scene reconstruction has recently advanced toward generalizable feed-forward architectures, enabling the generation of complex environments in a single forward pass. However, despite their strong performance in static scene perception, these models remain limited in responding to dyn...

View Paper

Quantitative Video World Model Evaluation for Geometric-Consistency

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

reinforcement learning

Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and wea...

Keywords: segmentation

View Paper

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

0

5.0/10

[object Object], [object Object] 5/14/2026 huggingface

computer vision

Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications,...

Keywords: attention

View Paper

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

computer vision

We introduce SANA-WM, an efficient 2.6B-parameter open-source world model natively trained for one-minute generation, synthesizing high-fidelity, 720p, minute-scale videos with precise camera control. SANA-WM achieves visual quality comparable to large-scale industrial baselines such as LingBot-Worl...

Keywords: transformer, attention

View Paper

Does Synthetic Layered Design Data Benefit Layered Design Decomposition?

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

computer vision

Recent advances in image generation have made it easy to produce high-quality images. However, these outputs are inherently flattened, entangling foreground elements, background, and text within a fixed canvas. As a result, flexible post-generation editing remains challenging, revealing a clear last...

View Paper

Self-Distilled Agentic Reinforcement Learning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/14/2026 huggingface

computer vision

Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher...

Keywords: reinforcement learning

View Paper

Export Archive Data

Browse by Date

Papers for May 15, 2026

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

FutureSim: Replaying World Events to Evaluate Adaptive Agents

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Quantitative Video World Model Evaluation for Geometric-Consistency

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Does Synthetic Layered Design Data Benefit Layered Design Decomposition?

Self-Distilled Agentic Reinforcement Learning