Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for December 24, 2025

10 papers found

Jianhong Bai, Xiaoshi Wu, Xintao Wang, Fu Xiao, Yuanxing Zhang, Qinghe Wang, Xiaoyu Shi, Menghan Xia, Zuozhu Liu, Haoji Hu, Pengfei Wan, Kun Gai 12/23/2025 arxiv

generative models

State-of-the-art video generative models typically learn the distribution of video latents in the VAE space and map them to pixels using a VAE decoder. While this approach can generate high-quality videos, it suffers from slow convergence and is computationally expensive when generating long videos....

Keywords: SemanticGen, video generation, semantic space, diffusion model, VAE latents, two-stage generation, long video generation, global planning

Runtao Liu, Ziyi Liu, Jiaqi Tang, Yue Ma, Renjie Pi, Jipeng Zhang, Qifeng Chen 12/23/2025 arxiv

machine learning

Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We pro...

Keywords: long-video, multi-agent, reinforcement learning, grounding, vision-language, LLM, LongTVQA, TVQA

Yuxi Xiao, Longfei Li, Shen Yan, Xinhang Liu, Sida Peng, Yunchao Wei, Xiaowei Zhou, Bingyi Kang 12/23/2025 arxiv

machine learning

Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood, as most studies focus on a narrow set of tasks. We introduce SpatialTree, a cognitive-science-inspired hierar...

Keywords: SpatialTree, spatial abilities, MLLM, multimodal, hierarchical benchmark, transfer learning, negative transfer, auto-think

Xuanhua He, Tianyu Yang, Ke Cao, Ruiqi Wu, Cheng Meng, Yong Zhang, Zhuoliang Kang, Xiaoming Wei, Qifeng Chen 12/23/2025 arxiv

computer vision

Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term goals through adaptive environmental interaction. We address this by introducing L-IVA (Long-horizon Interactive Visual Avatar), a task and b...

Keywords: L-IVA, ORCA, Internal World Model, OTAR, closed-loop, POMDP, video avatars, long-horizon planning

Yibin Lei, Shwai He, Ang Li, Andrew Yates 12/23/2025 arxiv

machine learning

Recent work has shown that directly fine-tuning large language models (LLMs) for dense retrieval yields strong performance, but their substantial parameter counts make them computationally inefficient. While prior studies have revealed significant layer redundancy in LLMs for generative tasks, it re...

Keywords: dense retrieval, LLM compression, MLP pruning, EffiR, coarse-to-fine, BEIR, efficient retrieval

Yedi Zhang, Andrew Saxe, Peter E. Latham 12/23/2025 arxiv

machine learning

Neural networks trained with gradient descent often learn solutions of increasing complexity over time, a phenomenon known as simplicity bias. Despite being widely observed across architectures, existing theoretical treatments lack a unifying framework. We present a theoretical framework that explai...

Keywords: simplicity bias, saddle-to-saddle dynamics, gradient descent, invariant manifolds, fixed points, ReLU kinks, rank growth, convolutional kernels

Soowon Son, Honggyu An, Chaehyun Kim, Hyunah Ko, Jisu Nam, Dahyun Chung, Siyoon Jin, Jung Yi, Jaewon Min, Junhwa Hur, Seungryong Kim 12/23/2025 arxiv

computer vision

Point tracking aims to localize corresponding points across video frames, serving as a fundamental task for 4D reconstruction, robotics, and video editing. Existing methods commonly rely on shallow convolutional backbones such as ResNet that process frames independently, lacking temporal coherence a...

Keywords: Video Diffusion Transformer, DiT, point tracking, DiTracker, LoRA, query-key attention, cost fusion, ResNet

Seijin Kobayashi, Yanick Schimpf, Maximilian Schlegel, Angelika Steger, Maciej Wolczyk, Johannes von Oswald, Nino Scherre, Kaitlin Maile, Guillaume Lajoie, Blake A. Richards, Rif A. Saurous, James Manyika, Blaise Agüera y Arcas, Alexander Meulemans, João Sacramento 12/23/2025 arxiv

reinforcement learning

Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token c...

Keywords: autoregressive models, temporal abstraction, hierarchical reinforcement learning, internal RL, residual stream control, non-causal sequence model, sparse rewards, MuJoCo

Dhruv Anand, Ehsan Shareghi 12/23/2025 arxiv

machine learning

We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark decomposes performance into five skills: (i) reconstructing cube faces from images and text, (ii) choosing the optimal next move, (iii) predict...

Keywords: Cube Bench, Rubik's cube, spatial reasoning, MLLMs, multimodal benchmark, sequential planning, self-correction, closed-vs-open-source gap

Changyi Lin, Boda Huo, Mingyang Yu, Emily Ruppel, Bingqing Chen, Jonathan Francis, Ding Zhao 12/23/2025 arxiv

machine learning

Contact often occurs without macroscopic surface deformation, such as during interaction with liquids, semi-liquids, or ultra-soft materials. Most existing tactile sensors rely on deformation to infer contact, making such light-contact interactions difficult to perceive robustly. To address this, we...

Keywords: visual-tactile, tactile sensing, optical sensor, light contact, robotic manipulation, contact segmentation, vision-language models, soft materials
Loading...

Preparing your export...