Paper Archive

HarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models

0

5.0/10

Qiuxuan Feng, Jiale Yu, Jiaming Liu, Yueru Jia, Zhuangzhe Wu, Hao Chen, Zezhong Qian, Shuo Gu, Peng Jia, Siwei Ma, Shanghang Zhang 5/11/2026 arxiv

computer vision

World Action Models (WAMs) have emerged as a promising paradigm for robot control by modeling physical dynamics. Current WAMs generally follow two paradigms: the "Imagine-then-Execute" approach, which uses video prediction to infer actions via inverse dynamics, and the "Joint Modeling" approach, whi...

View Paper

ELF: Embedded Language Flows

0

5.0/10

Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, Yoon Kim, Jacob Andreas, Kaiming He 5/11/2026 arxiv

computer vision

Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling. Unlike their image-domain counterparts, today's leading diffusion langua...

Keywords: diffusion model

View Paper

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

0

5.0/10

Haoyuan Sun, Jing Wang, Yuxin Song, Yu Lu, Bo Fang, Yifu Luo, Jun Yin, Pengyu Zeng, Miao Zhang, Tiantian Zhang, Xueqian Wang, Shijian Lu 5/11/2026 arxiv

computer vision

Recently, post-training methods based on reinforcement learning, with a particular focus on Group Relative Policy Optimization (GRPO), have emerged as the robust paradigm for further advancement of text-to-image (T2I) models. However, these methods are often prone to reward hacking, wherein models e...

Keywords: reinforcement learning

View Paper

Personal Visual Context Learning in Large Multimodal Models

0

5.0/10

Zihui Xue, Ami Baid, Sangho Kim, Mi Luo, Kristen Grauman 5/11/2026 arxiv

computer vision

As wearable devices like smart glasses integrate Large Multimodal Models (LMMs) into the continuous first-person visual streams of individual users, the evolution of these models into true personal assistants hinges on visual personalization: the ability to reason over visual information unique to t...

View Paper

Variational Inference for Lévy Process-Driven SDEs via Neural Tilting

0

5.0/10

Yaman Kindap, Manfred Opper, Benjamin Dupuis, Umut Simsekli, Tolga Birdal 5/11/2026 arxiv

reinforcement learning

Modelling extreme events and heavy-tailed phenomena is central to building reliable predictive systems in domains such as finance, climate science, and safety-critical AI. While Lévy processes provide a natural mathematical framework for capturing jumps and heavy tails, Bayesian inference for Lévy-d...

Keywords: neural network

View Paper

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

0

5.0/10

Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu 5/11/2026 arxiv

machine learning

While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenecks, which hinder efficient end-side deployment that simultaneously requires high performance, low computati...

Keywords: transformer

View Paper

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

0

5.0/10

Albert Alcalde, Leon Bungert, Konstantin Riedl, Tim Roith 5/11/2026 arxiv

natural language processing

Transformers with self-attention modules as their core components have become an integral architecture in modern large language and foundation models. In this paper, we study the evolution of tokens in deep encoder-only transformers at inference time which is described in the large-token limit by a ...

Keywords: transformer, attention

View Paper

PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

0

5.0/10

Xinyu Guo, Bin Xie, Wei Chai, Xianchi Deng, Tiancai Wang, Zhengxing Wu, Xingyu Chen 5/11/2026 arxiv

computer vision

Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the common practice of full fine-tuning treats pretraining as initialization and can shift broad priors towa...

Keywords: fine-tuning, pretraining

View Paper

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

0

5.0/10

Junhao Shen, Teng Zhang, Xiaoyan Zhao, Hong Cheng 5/11/2026 arxiv

natural language processing

Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized int...

Keywords: reinforcement learning

View Paper

Pixal3D: Pixel-Aligned 3D Generation from Images

0

5.0/10

Dong-Yang Li, Wang Zhao, Yuxin Chen, Wenbo Hu, Meng-Hao Guo, Fang-Lue Zhang, Ying Shan, Shi-Min Hu 5/11/2026 arxiv

computer vision

Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We a...

Keywords: attention

View Paper

Export Archive Data

Browse by Date

Papers for May 12, 2026

HarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models

ELF: Embedded Language Flows

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

Personal Visual Context Learning in Large Multimodal Models

Variational Inference for Lévy Process-Driven SDEs via Neural Tilting

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

Pixal3D: Pixel-Aligned 3D Generation from Images