Paper Archive

Browse and export your curated research paper collection

79
Archived Days
788
Total Papers
8.4
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for December 1, 2025

10 papers found

Muhammad Maaz, Hanoona Rasheed, Fahad Shahbaz Khan, Salman Khan 11/28/2025 arxiv

machine learning

Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning traces for interpretability; however, their reasoning often appears convincing while being logically inconsistent or weakly grounded in visual ev...

Keywords: Video-R2, TAC, VAS, GRPO, TAR, multimodal LLM, temporal alignment, video reasoning

Hanoona Rasheed, Mohammed Zumri, Muhammad Maaz, Ming-Hsuan Yang, Fahad Shahbaz Khan, Salman Khan 11/28/2025 arxiv

computer vision

Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still "think about videos" ie once a video is encoded, reasoning unfolds entirely in text, treating visual input as a static context. This passive paradigm creates a semantic bottleneck: models cannot rewatch...

Keywords: Video-CoM, Chain of Manipulations, interactive video reasoning, Video CoM Instruct, Group Relative Policy Optimization, GRPO, reasoning-aware rewards, multimodal LLMs

Zhizhou Zhong, Yicheng Ji, Zhe Kong, Yiying Liu, Jiarui Wang, Jiasun Feng, Lupeng Liu, Xiangyi Wang, Yanjia Li, Yuqing She, Ying Qin, Huan Li, Shuiyang Mao, Wei Liu, Wenhan Luo 11/28/2025 arxiv

computer vision

Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video generation, they often face challenges due to the high costs of diverse multi-person data collection and the difficulty of driving multiple iden...

Keywords: multi-person talking video, Diffusion Transformer, identity-aware attention, multi-stream architecture, data-efficient training, lip synchronization, interactivity metric

Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen 11/28/2025 arxiv

machine learning

Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure infe...

Keywords: ThetaEvolve, test-time learning, in-context learning, reinforcement learning, program database, lazy penalties, reward shaping, open-source

Jiahao Guo, Sinan Du, Jingfeng Yao, Wenyu Liu, Bo Li, Haoxiang Cao, Kun Gai, Chun Yuan, Kai Wu, Xinggang Wang 11/28/2025 arxiv

machine learning

Large Vision Language Models (VLMs) effectively bridge the modality gap through extensive pretraining, acquiring sophisticated visual representations aligned with language. However, it remains underexplored whether these representations, optimized for multimodal understanding tasks, harbor an inhere...

Keywords: Visual Generation Tuning, VGT, VGT-AE, vision-language models, VLM, autoregressive modeling, latent compression, PSNR

Hans Gundlach, Jayson Lynch, Matthias Mertens, Neil Thompson 11/28/2025 arxiv

machine learning

Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artif...

Keywords: algorithmic efficiency, inference cost, benchmarks, AI economics, hardware efficiency, price-per-benchmark, dataset, open models

Tianlong Huang, Zhiyuan Li 11/28/2025 arxiv

machine learning

This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact realizations for any fixed length $m$ and, with bias, width-$2$ exact realizations uniformly over all lengths....

Keywords: sinusoidal activation, modular addition, expressivity, Natarajan dimension, generalization, two-layer MLP, ReLU, sample complexity

Hang Yu, Di Zhang, Qiwei Du, Yanping Zhao, Hai Zhang, Guang Chen, Eduardo E. Veas, Junqiao Zhao 11/28/2025 arxiv

reinforcement learning

Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While tra...

Keywords: ASTRO, trajectory stitching, offline reinforcement learning, temporal-distance representation, Rollout Deviation Feedback, dynamics-guided planning, OGBench, D4RL

Bernhard Klein, Falk Selker, Hendrik Borras, Sophie Steger, Franz Pernkopf, Holger Fröning 11/28/2025 arxiv

machine learning

Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confid...

Keywords: Bayesian neural networks, Probabilistic Forward Pass, PFP, Stochastic Variational Inference, TVM, embedded, ARM, uncertainty

Junshu Tang, Jiacheng Liu, Jiaqi Li, Longhuang Wu, Haoyu Yang, Penghao Zhao, Siruis Gong, Xiang Yuan, Shuai Shao, Qinglin Lu 11/28/2025 arxiv

computer vision

Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis toward dynamic, interactive simulation. However, current approaches remain limited by rigid action schemas and high annotation costs, restricting...

Keywords: Hunyuan-GameCraft-2, instruction-driven interaction, interactive video, generative world models, image-to-video, Mixture-of-Experts, MoE, causal alignment
Loading...

Preparing your export...