Paper Archive

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

0

9.0/10

Muhammad Maaz, Hanoona Rasheed, Fahad Shahbaz Khan, Salman Khan 11/28/2025 arxiv

machine learning

Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning traces for interpretability; however, their reasoning often appears convincing while being logically inconsistent or weakly grounded in visual ev...

Keywords: Video-R2, TAC, VAS, GRPO, TAR, multimodal LLM, temporal alignment, video reasoning

View Paper

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

0

9.0/10

Hanoona Rasheed, Mohammed Zumri, Muhammad Maaz, Ming-Hsuan Yang, Fahad Shahbaz Khan, Salman Khan 11/28/2025 arxiv

computer vision

Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still "think about videos" ie once a video is encoded, reasoning unfolds entirely in text, treating visual input as a static context. This passive paradigm creates a semantic bottleneck: models cannot rewatch...

Keywords: Video-CoM, Chain of Manipulations, interactive video reasoning, Video CoM Instruct, Group Relative Policy Optimization, GRPO, reasoning-aware rewards, multimodal LLMs

View Paper

AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

0

9.0/10

Zhizhou Zhong, Yicheng Ji, Zhe Kong, Yiying Liu, Jiarui Wang, Jiasun Feng, Lupeng Liu, Xiangyi Wang, Yanjia Li, Yuqing She, Ying Qin, Huan Li, Shuiyang Mao, Wei Liu, Wenhan Luo 11/28/2025 arxiv

computer vision

Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video generation, they often face challenges due to the high costs of diverse multi-person data collection and the difficulty of driving multiple iden...

Keywords: multi-person talking video, Diffusion Transformer, identity-aware attention, multi-stream architecture, data-efficient training, lip synchronization, interactivity metric

View Paper

ThetaEvolve: Test-time Learning on Open Problems

0

9.0/10

Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen 11/28/2025 arxiv

machine learning

Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure infe...

Keywords: ThetaEvolve, test-time learning, in-context learning, reinforcement learning, program database, lazy penalties, reward shaping, open-source

View Paper

Visual Generation Tuning

0

9.0/10

Jiahao Guo, Sinan Du, Jingfeng Yao, Wenyu Liu, Bo Li, Haoxiang Cao, Kun Gai, Chun Yuan, Kai Wu, Xinggang Wang 11/28/2025 arxiv

machine learning

Large Vision Language Models (VLMs) effectively bridge the modality gap through extensive pretraining, acquiring sophisticated visual representations aligned with language. However, it remains underexplored whether these representations, optimized for multimodal understanding tasks, harbor an inhere...

Keywords: Visual Generation Tuning, VGT, VGT-AE, vision-language models, VLM, autoregressive modeling, latent compression, PSNR

View Paper

The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

0

9.0/10

Hans Gundlach, Jayson Lynch, Matthias Mertens, Neil Thompson 11/28/2025 arxiv

machine learning

Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artif...

Keywords: algorithmic efficiency, inference cost, benchmarks, AI economics, hardware efficiency, price-per-benchmark, dataset, open models

View Paper

Provable Benefits of Sinusoidal Activation for Modular Addition

0

9.0/10

Tianlong Huang, Zhiyuan Li 11/28/2025 arxiv

machine learning

This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact realizations for any fixed length $m$ and, with bias, width-$2$ exact realizations uniformly over all lengths....

Keywords: sinusoidal activation, modular addition, expressivity, Natarajan dimension, generalization, two-layer MLP, ReLU, sample complexity

View Paper

ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

0

9.0/10

Hang Yu, Di Zhang, Qiwei Du, Yanping Zhao, Hai Zhang, Guang Chen, Eduardo E. Veas, Junqiao Zhao 11/28/2025 arxiv

reinforcement learning

Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While tra...

Keywords: ASTRO, trajectory stitching, offline reinforcement learning, temporal-distance representation, Rollout Deviation Feedback, dynamics-guided planning, OGBench, D4RL

View Paper

Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation

0

9.0/10

Bernhard Klein, Falk Selker, Hendrik Borras, Sophie Steger, Franz Pernkopf, Holger Fröning 11/28/2025 arxiv

machine learning

Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confid...

Keywords: Bayesian neural networks, Probabilistic Forward Pass, PFP, Stochastic Variational Inference, TVM, embedded, ARM, uncertainty

View Paper

Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model

0

9.0/10

Junshu Tang, Jiacheng Liu, Jiaqi Li, Longhuang Wu, Haoyu Yang, Penghao Zhao, Siruis Gong, Xiang Yuan, Shuai Shao, Qinglin Lu 11/28/2025 arxiv

computer vision

Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis toward dynamic, interactive simulation. However, current approaches remain limited by rigid action schemas and high annotation costs, restricting...

Keywords: Hunyuan-GameCraft-2, instruction-driven interaction, interactive video, generative world models, image-to-video, Mixture-of-Experts, MoE, causal alignment

View Paper

Export Archive Data

Browse by Date

Papers for December 1, 2025

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

ThetaEvolve: Test-time Learning on Open Problems

Visual Generation Tuning

The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

Provable Benefits of Sinusoidal Activation for Modular Addition

ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation

Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model