Paper Archive

Seeing Fast and Slow: Learning the Flow of Time in Videos

0

9.0/10

Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma 4/23/2026 arxiv

computer vision

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a l...

Keywords: time perception, self-supervised learning, playback speed estimation, speed-conditioned generation, temporal super-resolution, slow-motion dataset, video forensics, temporal control

View Paper

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

0

9.0/10

Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu 4/23/2026 arxiv

machine learning

Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same...

Keywords: streaming continual learning, temporal taskification, plasticity, stability, Boundary-Profile Sensitivity, benchmarking, CESNET-Timeseries24, experience replay

View Paper

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

0

9.0/10

Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour 4/23/2026 arxiv

machine learning

Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This pa...

Keywords: ASR, WER, large language models, generative embeddings, semantic evaluation, HATS dataset

View Paper

Fine-Tuning Regimes Define Distinct Continual Learning Problems

0

9.0/10

Paul-Tiberiu Iordache, Elena Burceanu 4/23/2026 arxiv

machine learning

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defin...

Keywords: continual learning, fine-tuning regime, trainable depth, projected optimization, forgetting, online EWC, LwF, SI

View Paper

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

0

9.0/10

Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu 4/23/2026 arxiv

robotics

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction followin...

Keywords: long-horizon manipulation, vision-language-action (VLA), vision-language model (VLM), trace-conditioned planning, receding-horizon, visual trace, keypoint trajectory, closed-loop replanning

View Paper

The Sample Complexity of Multicalibration

0

9.0/10

Natalie Collina, Jiuyao Lu, Georgy Noarov, Aaron Roth 4/23/2026 arxiv

machine learning

We study the minimax sample complexity of multicalibration in the batch setting. A learner observes $n$ i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most $\va...

Keywords: multicalibration, expected calibration error, sample complexity, minimax, online-to-batch, L_p, elicitable properties, calibration

View Paper

Context Unrolling in Omni Models

0

9.0/10

Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan 4/23/2026 arxiv

machine learning

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing...

Keywords: Omni, Context Unrolling, multimodal, multimodal reasoning, in-context generation, 3D geometry, video, hidden representations

View Paper

MathDuels: Evaluating LLMs as Problem Posers and Solvers

0

9.0/10

Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik 4/23/2026 arxiv

machine learning

As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in ...

Keywords: MathDuels, self-play, LLMs, benchmarking, Rasch model, meta-prompting, difficulty amplification, adversarial evaluation

View Paper

Vista4D: Video Reshooting with 4D Point Clouds

0

9.0/10

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca, Yash Kant, Ryan Burgert, Yuancheng Xu, Koichi Namekata, Yiwei Zhao, Bolei Zhou, Micah Goldblum, Paul Debevec, Ning Yu 4/23/2026 arxiv

computer vision

We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video re...

Keywords: 4D point cloud, video reshooting, view synthesis, dynamic scenes, multiview training, camera control, static-pixel segmentation, reconstruction

View Paper

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

0

9.0/10

Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding 4/23/2026 arxiv

robotics

Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that ...

Keywords: view synthesis, video diffusion, robot manipulation, 4D geometry, latent action learning, view generalization score, closed-loop control, action-chunking

View Paper

Export Archive Data

Browse by Date

Papers for April 25, 2026

Seeing Fast and Slow: Learning the Flow of Time in Videos

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

Fine-Tuning Regimes Define Distinct Continual Learning Problems

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

The Sample Complexity of Multicalibration

Context Unrolling in Omni Models

MathDuels: Evaluating LLMs as Problem Posers and Solvers

Vista4D: Video Reshooting with 4D Point Clouds

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis