Paper Archive

Cambrian-P: Pose-Grounded Video Understanding

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

reinforcement learning

Camera pose matters. The position and orientation of each viewpoint define a shared spatial coordinate frame that relates observations across video frames. Yet this signal is largely absent from multimodal LLMs (MLLMs) for video understanding, which process frames as isolated 2D snapshots, instead o...

Keywords: regression

View Paper

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

computer vision

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we intro...

View Paper

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

computer vision

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. ...

Keywords: multi-modal

View Paper

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

0

5.0/10

[object Object], [object Object], [object Object] 5/21/2026 huggingface

natural language processing

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. De...

Keywords: attention

View Paper

DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

computer vision

Representation Autoencoders (RAEs) leverage frozen vision foundation models (VFMs) as tokenizer encoders, providing robust high-level representations that facilitate fast convergence and high-quality generation in latent diffusion models. However, freezing the VFM inherently constrains its spatial r...

Keywords: diffusion model, fine-tuning

View Paper

Diversed Model Discovery via Structured Table Discovery

0

5.0/10

[object Object], [object Object] 5/21/2026 huggingface

natural language processing

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exp...

View Paper

WorldKV: Efficient World Memory with World Retrieval and Compression

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

computer vision

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks re...

Keywords: attention, diffusion model, fine-tuning

View Paper

Forecasting Scientific Progress with Artificial Intelligence

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

machine learning

Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints...

View Paper

Swift Sampling: Selecting Temporal Surprises via Taylor Series

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

computer vision

While most frames in long-form video are redundant, the critical information resides in temporal surprises: moments where the actual visual features deviate from their predicted evolution. Inspired by the human brain's predictive coding, we introduce Swift Sampling, an elegant, training-free frame s...

View Paper

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/21/2026 huggingface

natural language processing

Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a pr...

Keywords: reinforcement learning, fine-tuning

View Paper

Export Archive Data

Browse by Date

Papers for May 22, 2026

Cambrian-P: Pose-Grounded Video Understanding

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

Diversed Model Discovery via Structured Table Discovery

WorldKV: Efficient World Memory with World Retrieval and Compression

Forecasting Scientific Progress with Artificial Intelligence

Swift Sampling: Selecting Temporal Surprises via Taylor Series

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning