Paper Archive

Sapiens2

0

10.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/23/2026 huggingface

computer vision

We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substa...

Keywords: human-centric vision, high-resolution transformers, masked image reconstruction, self-distilled contrastive learning, windowed attention, 4K, pose estimation, body-part segmentation

View Paper

Seeing Fast and Slow: Learning the Flow of Time in Videos

0

9.0/10

Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma 4/23/2026 arxiv

computer vision

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a l...

Keywords: temporal reasoning, self-supervised learning, video speed estimation, slow-motion dataset, speed-conditioned generation, temporal super-resolution, video forensics, video generation

View Paper

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

0

9.0/10

Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu 4/23/2026 arxiv

machine learning

Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same...

Keywords: streaming continual learning, temporal taskification, Boundary-Profile Sensitivity, plasticity-stability profiles, profile distance, CESNET-Timeseries24, Experience Replay, Elastic Weight Consolidation

View Paper

Fine-Tuning Regimes Define Distinct Continual Learning Problems

0

9.0/10

Paul-Tiberiu Iordache, Elena Burceanu 4/23/2026 arxiv

machine learning

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defin...

Keywords: continual learning, fine-tuning regime, trainable depth, projected optimization, catastrophic forgetting, online EWC, LwF, SI

View Paper

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

0

9.0/10

Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu 4/23/2026 arxiv

robotics

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction followin...

Keywords: LoHo-Manip, vision-language-action, VLM, trace-conditioned planning, receding-horizon, task management, robot manipulation, replanning

View Paper

The Sample Complexity of Multicalibration

0

9.0/10

Natalie Collina, Jiuyao Lu, Georgy Noarov, Aaron Roth 4/23/2026 arxiv

machine learning

We study the minimax sample complexity of multicalibration in the batch setting. A learner observes $n$ i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most $\va...

Keywords: multicalibration, sample complexity, expected calibration error, online-to-batch, L_p multicalibration, elicitable properties, expectiles, quantiles

View Paper

Context Unrolling in Omni Models

0

9.0/10

Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan 4/23/2026 arxiv

multimodal learning

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing...

Keywords: Omni, Context Unrolling, multimodal, unified model, 3D geometry, in-context generation, hidden representations, multimodal reasoning

View Paper

MathDuels: Evaluating LLMs as Problem Posers and Solvers

0

9.0/10

Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik 4/23/2026 arxiv

machine learning

As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in ...

Keywords: LLMs, benchmark, self-play, math problems, adversarial prompting, problem generation, difficulty amplification, Rasch model

View Paper

Vista4D: Video Reshooting with 4D Point Clouds

0

9.0/10

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca, Yash Kant, Ryan Burgert, Yuancheng Xu, Koichi Namekata, Yiwei Zhao, Bolei Zhou, Micah Goldblum, Paul Debevec, Ning Yu 4/23/2026 arxiv

computer vision

We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video re...

Keywords: 4D point cloud, video reshooting, dynamic scene reconstruction, multiview training, camera control, visual synthesis, static pixel segmentation

View Paper

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

0

9.0/10

Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding 4/23/2026 arxiv

robotics

Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that ...

Keywords: VistaBot, view-robust, robot manipulation, spatiotemporal-aware view synthesis, 4D geometry estimation, video diffusion models, latent action learning, View Generalization Score (VGS)

View Paper

Export Archive Data

Browse by Date

Papers for April 26, 2026

Sapiens2

Seeing Fast and Slow: Learning the Flow of Time in Videos

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Fine-Tuning Regimes Define Distinct Continual Learning Problems

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

The Sample Complexity of Multicalibration

Context Unrolling in Omni Models

MathDuels: Evaluating LLMs as Problem Posers and Solvers

Vista4D: Video Reshooting with 4D Point Clouds

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis