Paper Archive

EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI

0

9.0/10

Jianlei Chang, Ruofeng Mei, Wei Ke, Xiangyu Xu 12/1/2025 arxiv

machine learning

Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling ineffic...

Keywords: EfficientFlow, flow-based policy, flow matching, equivariance, acceleration regularization, sampling efficiency, data efficiency, embodied AI

View Paper

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

0

9.0/10

Sebastian Sanokowski, Kaustubh Patil, Alois Knoll 12/1/2025 arxiv

reinforcement learning

Diffusion models have achieved remarkable success in data-driven learning and in sampling from complex, unnormalized target distributions. Building on this progress, we reinterpret Maximum Entropy Reinforcement Learning (MaxEntRL) as a diffusion model-based sampling problem. We tackle this problem b...

Keywords: diffusion models, maximum entropy reinforcement learning, reverse KL, policy gradient, Soft Actor-Critic, PPO, Wasserstein Policy Optimization, DiffSAC

View Paper

Data-Centric Visual Development for Self-Driving Labs

0

9.0/10

Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu 12/1/2025 arxiv

computer vision

Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. Howeve...

Keywords: self-driving labs, pipetting, bubble detection, data-centric AI, synthetic data, human-in-the-loop, prompt-guided generation, reference-conditioned

View Paper

Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion

0

9.0/10

Shaowei Liu, David Yifan Yao, Saurabh Gupta, Shenlong Wang 12/1/2025 arxiv

computer vision

Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targ...

Keywords: multi-camera synchronization, epipolar constraints, multi-view dynamics, 3D reconstruction, feature matching, dense tracking, time offset estimation

View Paper

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

0

9.0/10

Varun Varma Thozhiyoor, Shivam Tripathi, Venkatesh Babu Radhakrishnan, Anand Bhattad 12/1/2025 arxiv

machine learning

Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceler...

Keywords: video generation, gravity, Galileo's equivalence principle, unit-free protocol, low-rank adaptor, world models, physical reasoning, video synthesis

View Paper

Generative Video Motion Editing with 3D Point Tracks

0

9.0/10

Yao-Chih Lee, Zhoutong Zhang, Jiahui Huang, Jui-Hsien Wang, Joon-Young Lee, Jia-Bin Huang, Eli Shechtman, Zhengqi Li 12/1/2025 arxiv

computer vision

Camera and object motions are central to a video's narrative. However, precisely editing these captured motions remains a significant challenge, especially under complex object movements. Current motion-controlled image-to-video (I2V) approaches often lack full-scene context for consistent video edi...

Keywords: video editing, 3D point tracks, video-to-video, track-conditioned, motion transfer, occlusion handling, spatiotemporal coherence, camera motion

View Paper

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

0

9.0/10

Zhiheng Liu, Weiming Ren, Haozhe Liu, Zijian Zhou, Shoufa Chen, Haonan Qiu, Xiaoke Huang, Zhaochong An, Fanny Yang, Aditya Patel, Viktar Atliha, Tony Ng, Xiao Han, Chuyan Zhu, Chenyang Zhang, Ding Liu, Juan-Manuel Perez-Rua, Sen He, Jürgen Schmidhuber, Wenhu Chen, Ping Luo, Wei Liu, Tao Xiang, Jonas Schult, Yuren Cong 12/1/2025 arxiv

machine learning

Unified multimodal models (UMMs) aim to jointly perform multimodal understanding and generation within a single framework. We present TUNA, a native UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space ...

Keywords: unified multimodal models, unified visual representation, VAE encoder, representation encoder, end-to-end, image generation, video understanding, multimodal

View Paper

ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation

0

9.0/10

Chenyang Gu, Jiaming Liu, Hao Chen, Runzhong Huang, Qingpo Wuwu, Zhuoyang Liu, Xiaoqi Li, Ying Li, Renrui Zhang, Peng Jia, Pheng-Ann Heng, Shanghang Zhang 12/1/2025 arxiv

robotics

Vision-Language-Action (VLA) models have recently emerged, demonstrating strong generalization in robotic scene understanding and manipulation. However, when confronted with long-horizon tasks that require defined goal states, such as LEGO assembly or object rearrangement, existing VLA models still ...

Keywords: Vision-Language-Action, ManualVLA, Mixture-of-Transformers, ManualCoT, 3D Gaussian Splatting, digital-twin, LEGO assembly, object rearrangement

View Paper

Improved Mean Flows: On the Challenges of Fastforward Generative Models

0

9.0/10

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, Kaiming He 12/1/2025 arxiv

generative models

MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in both the training objective and the guidance mechanism. First, the original MF's training target depends not only on the underlying ground-tru...

Keywords: MeanFlow, iMF, fastforward, one-step generative model, instantaneous velocity, classifier-free guidance, in-context conditioning, ImageNet

View Paper

Learning Dexterous Manipulation Skills from Imperfect Simulations

0

9.0/10

Elvis Hsieh, Wen-Han Hsieh, Yen-Jen Wang, Toru Lin, Jitendra Malik, Koushil Sreenath, Haozhi Qi 12/1/2025 arxiv

robotics

Reinforcement learning and sim-to-real transfer have made significant progress in dexterous manipulation. However, progress remains limited by the difficulty of simulating complex contact dynamics and multisensory signals, especially tactile feedback. In this work, we propose \ours, a sim-to-real fr...

Keywords: sim-to-real, dexterous manipulation, tactile sensing, reinforcement learning, behavior cloning, teleoperation, robotics, contact dynamics

View Paper

Export Archive Data

Browse by Date

Papers for December 2, 2025

EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

Data-Centric Visual Development for Self-Driving Labs

Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

Generative Video Motion Editing with 3D Point Tracks

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Learning Dexterous Manipulation Skills from Imperfect Simulations