Paper Archive

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

0

9.0/10

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu 3/5/2026 arxiv

computer vision

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on p...

Keywords: FaceCam, scale-aware representation, camera control, portrait video, video generation, conditioning, multi-view, in-the-wild

View Paper

RoboPocket: Improve Robot Policies Instantly with Your Phone

0

9.0/10

Junjie Fang, Wendi Chen, Han Xue, Fangyuan Zhou, Tian Le, Yi Wang, Yuting Zhang, Jun Lv, Chuan Wen, Cewu Lu 3/5/2026 arxiv

robotics

Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing th...

Keywords: RoboPocket, Remote Inference, AR Visual Foresight, imitation learning, online finetuning, data efficiency, robot-free, mobile data collection

View Paper

Accelerating Text-to-Video Generation with Calibrated Sparse Attention

0

9.0/10

Shai Yehezkel, Shahar Yadin, Noam Elata, Yaron Ostrovsky-Berman, Bahjat Kawar 3/5/2026 arxiv

machine learning

Recent diffusion models enable high-quality video generation, but suffer from slow runtimes. The large transformer-based backbones used in these models are bottlenecked by spatiotemporal attention. In this paper, we identify that a significant fraction of token-to-token connections consistently yiel...

Keywords: CalibAtt, sparse attention, text-to-video, diffusion models, transformer, spatiotemporal attention, attention calibration

View Paper

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

0

9.0/10

Khai Nguyen, Petros Ellinas, Anvita Bhagavathula, Priya Donti 3/5/2026 arxiv

optimization

To scale the solution of optimization and simulation problems, prior work has explored machine-learning surrogates that inexpensively map problem parameters to corresponding solutions. Commonly used approaches, including supervised and self-supervised learning with either soft or hard feasibility en...

Keywords: amortized optimization, cheap labels, self-supervised learning, supervised pretraining, inexact labels, surrogate models, nonconvex constrained optimization, power-grid operation

View Paper

cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots

0

9.0/10

Balakumar Sundaralingam, Adithyavairavan Murali, Stan Birchfield 3/5/2026 arxiv

robotics

Effective robot autonomy requires motion generation that is safe, feasible, and reactive. Current methods are fragmented: fast planners output physically unexecutable trajectories, reactive controllers struggle with high-fidelity perception, and existing solvers fail on high-DoF systems. We present ...

Keywords: cuRoboV2, TSDF, ESDF, depth-fused distance fields, B-spline trajectory optimization, GPU-native, high-DoF, differentiable inverse dynamics

View Paper

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

0

9.0/10

Siddharth Boppana, Annabel Ma, Max Loeffler, Raphael Sarfati, Eric Bigelow, Atticus Geiger, Owen Lewis, Jack Merullo 3/5/2026 arxiv

machine learning

We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal belief. Our analysis compares activation probing, early forced answering, and a CoT monitor acr...

Keywords: chain-of-thought, performative reasoning, activation probing, early exit, adaptive computation, interpretability, LLM, MMLU

View Paper

Observing and Controlling Features in Vision-Language-Action Models

0

9.0/10

Hugo Buurmeijer, Carmen Amo Alonso, Aiden Swann, Marco Pavone 3/5/2026 arxiv

machine learning

Vision-Language-Action Models (VLAs) have shown remarkable progress towards embodied intelligence. While their architecture partially resembles that of Large Language Models (LLMs), VLAs exhibit higher complexity due to their multi-modal inputs/outputs and often hybrid nature of transformer and diff...

Keywords: vision-language-action, interpretability, feature-observability, feature-controllability, linear intervention, optimal control, closed-loop, π_{0.5}

View Paper

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

0

9.0/10

Benjamin Feuer, Lucas Rosenblatt, Oussama Elachqar 3/5/2026 arxiv

machine learning

As AI models progress beyond simple chatbots into more complex workflows, we draw ever closer to the event horizon beyond which AI systems will be utilized in autonomous, self-maintaining feedback loops. Any autonomous AI system will depend on automated, verifiable rewards and feedback; in settings ...

Keywords: average bias-boundedness, A-BB, LLM judge, bias guarantees, provable evaluation, Arena-Hard-Auto, automated feedback, adversarial bias

View Paper

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

0

9.0/10

Guo Chen, Lidong Lu, Yicheng Liu, Liangrui Dong, Lidong Zou, Jixin Lv, Zhenquan Li, Xinyi Mao, Baoqi Pei, Shihao Wang, Zhiqi Li, Karan Sapra, Fuxiao Liu, Yin-Dong Zheng, Yifei Huang, Limin Wang, Zhiding Yu, Andrew Tao, Guilin Liu, Tong Lu 3/5/2026 arxiv

machine learning

While datasets for video understanding have scaled to hour-long durations, they typically consist of densely concatenated clips that differ from natural, unscripted daily life. To bridge this gap, we introduce MM-Lifelong, a dataset designed for Multimodal Lifelong Understanding. Comprising 181.1 ho...

Keywords: MM-Lifelong, ReMA, multimodal, lifelong learning, working memory bottleneck, global localization collapse, dynamic memory, recursive belief state

View Paper

Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups

0

8.0/10

Leif Van Holland, Domenic Zingsheim, Mana Takhsha, Hannah Dröge, Patrick Stotko, Markus Plack, Reinhard Klein 3/5/2026 arxiv

computer vision

High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real-time constraints - leads to missing information and incomplete surfaces in the rendered images. Existing approaches typically rely on simpl...

Keywords: transformer, inpainting, real-time 3D streaming, multi-camera, spatio-temporal embeddings, adaptive patch selection, resolution-independent, AR/VR

View Paper

Export Archive Data

Browse by Date

Papers for March 6, 2026

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

RoboPocket: Improve Robot Policies Instantly with Your Phone

Accelerating Text-to-Video Generation with Calibrated Sparse Attention

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

Observing and Controlling Features in Vision-Language-Action Models

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups