Paper Archive

Utonia: Toward One Encoder for All Point Clouds

0

9.0/10

Yujia Zhang, Xiaoyang Wu, Yunhan Yang, Xianzhe Fan, Han Li, Yuechen Zhang, Zehao Huang, Naiyan Wang, Hengshuang Zhao 3/3/2026 arxiv

computer vision

We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder across diverse domains, spanning remote sensing, outdoor LiD...

Keywords: point cloud, self-supervised learning, point transformer, cross-domain, LiDAR, RGB-D, CAD, foundation model

View Paper

MIBURI: Towards Expressive Interactive Gesture Synthesis

0

9.0/10

M. Hamza Mughal, Rishabh Dabral, Vera Demberg, Christian Theobalt 3/3/2026 arxiv

machine learning

Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language model (LLM)-based conversational agents lack embodiment and the expressive gestures essential for natural interaction. Existing solutions for E...

Keywords: co-speech_gesture_synthesis, embodied_conversational_agents, causal_autoregressive, body-part_codecs, discrete_tokens, LLM_conditioning, real-time, expressivity

View Paper

CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

0

9.0/10

Hanyang Wang, Yiyang Liu, Jiawei Chi, Fangfu Liu, Ran Xue, Yueqi Duan 3/3/2026 arxiv

computer vision

Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we explore a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the c...

Keywords: Classifier-Free Guidance, CFG-Ctrl, SMC-CFG, Sliding Mode Control, diffusion models, semantic alignment, Lyapunov stability, Stable Diffusion 3.5

View Paper

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

0

9.0/10

Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, Jitendra Malik 3/3/2026 arxiv

robotics

Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks are characterized not only by contact-rich, force-sensitive dynamics, but also by their "implicit" success criteria: unlike pick-and-place, task quality in...

Keywords: robotic manipulation, preference-based finetuning, imitation learning, force-aware data collection, learned reward model, fine-grained manipulation, peeling, human preference

View Paper

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation

0

9.0/10

Xialin He, Sirui Xu, Xinyao Li, Runpei Dong, Liuyu Bian, Yu-Xiong Wang, Liang-Yan Gui 3/3/2026 arxiv

robotics

Achieving autonomous and versatile whole-body loco-manipulation remains a central barrier to making humanoids practically useful. Yet existing approaches are fundamentally constrained: retargeted data are often scarce or low-quality; methods struggle to scale to large skill repertoires; and, most im...

Keywords: ULTRA, neural retargeting, multimodal controller, loco-manipulation, humanoid, egocentric perception, reinforcement learning, skill latent space

View Paper

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

0

9.0/10

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Yecheng Jason Ma, Dinesh Jayaraman 3/3/2026 arxiv

robotics

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, an...

Keywords: autonomous play, trajectory warping, semantic correspondences, vision-language models, data-efficient imitation, open-loop policy, robot learning

View Paper

Beyond Language Modeling: An Exploration of Multimodal Pretraining

0

9.0/10

Shengbang Tong, David Fan, John Nguyen, Ellis Brown, Gaoyue Zhou, Shengyi Qian, Boyang Zheng, Théophane Vallaeys, Junlin Han, Rob Fergus, Naila Murray, Marjan Ghazvininejad, Mike Lewis, Nicolas Ballas, Amir Bar, Michael Rabbat, Jakob Verbeek, Luke Zettlemoyer, Koustuv Sinha, Yann LeCun, Saining Xie 3/3/2026 arxiv

machine learning

The visual world offers a critical axis for advancing foundation models beyond language. Despite growing interest in this direction, the design space for native multimodal models remains opaque. We provide empirical clarity through controlled, from-scratch pretraining experiments, isolating the fact...

Keywords: multimodal, Transfusion, Representation Autoencoder, RAE, Mixture-of-Experts, MoE, IsoFLOP, scaling laws

View Paper

Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

0

9.0/10

Jessie Z. Li, Zhiqing Hong, Toru Shirakawa, Serina Chang 3/3/2026 arxiv

machine learning

Human mobility trajectories are widely studied in public health and social science, where different demographic groups exhibit significantly different mobility patterns. However, existing trajectory generation models rarely capture this heterogeneity because most trajectory datasets lack demographic...

Keywords: ATLAS, weak supervision, mobility trajectories, demographic-conditioned generation, aggregate supervision, census data, JSD, trajectory generation

View Paper

Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

0

9.0/10

Adam Dorian Wong, John D. Hastings 3/3/2026 arxiv

machine learning

Mobile devices are frequent targets of eCrime threat actors through SMS spearphishing (smishing) links that leverage Domain Generation Algorithms (DGA) to rotate hostile infrastructure. Despite this, DGA research and evaluation largely emphasize malware C2 and email phishing datasets, leaving limite...

Keywords: DGA, smishing, mobile security, domain generation algorithm, Gravity Falls dataset, entropy, LSTM, COSSAS DGAD

View Paper

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

0

9.0/10

Junyi Zhang, Charles Herrmann, Junhwa Hur, Chen Sun, Ming-Hsuan Yang, Forrester Cole, Trevor Darrell, Deqing Sun 3/3/2026 arxiv

computer vision

Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architectu...

Keywords: 3D reconstruction, long-context, hybrid memory, test-time training, sliding window attention, feedforward geometric models, KITTI, VBR dataset

View Paper

Export Archive Data

Browse by Date

Papers for March 4, 2026

Utonia: Toward One Encoder for All Point Clouds

MIBURI: Towards Expressive Interactive Gesture Synthesis

CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory