Paper Archive

Browse and export your curated research paper collection

33
Archived Days
330
Total Papers
7.8
Avg Score
7
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for October 15, 2025

10 papers found

Kartik Narayan, Yang Xu, Tian Cao, Kavya Nerella, Vishal M. Patel, Navid Shiee, Peter Grasch, Chao Jia, Yinfei Yang, Zhe Gan 10/14/2025 arxiv

machine learning

Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such ...

Keywords: Multimodal LLM, web search, query crafting, image crop search, DeepMMSearch-R1, DeepMMSearchVQA, reinforcement learning, multimodal VQA

Qing Jiang, Junan Huo, Xingyu Chen, Yuda Xiong, Zhaoyang Zeng, Yihao Chen, Tianhe Ren, Junzhi Yu, Lei Zhang 10/14/2025 arxiv

computer vision

Object detection has long been dominated by traditional coordinate regression-based models, such as YOLO, DETR, and Grounding DINO. Although recent efforts have attempted to leverage MLLMs to tackle this task, they face challenges like low recall rate, duplicate predictions, coordinate misalignment,...

Keywords: next point prediction, Rex-Omni, MLLM, object detection, quantized coordinates, GRPO, reinforcement learning, zero-shot

Yingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan, Haochen Wang, Yuqi Wang, Yuntao Chen, Xiaoman Wang, Yasong An, Chufeng Tang, Lu Hou, Lue Fan, Zhaoxiang Zhang 10/14/2025 arxiv

machine learning

Scaling Vision-Language-Action (VLA) models on large-scale data offers a promising path to achieving a more generalized driving intelligence. However, VLA models are limited by a ``supervision deficit'': the vast model capacity is supervised by sparse, low-dimensional actions, leaving much of their ...

Keywords: DriveVLA-W0, world modeling, vision-language-action, data scaling law, autonomous driving, autoregressive model, diffusion model, self-supervised learning

Caner Korkmaz, Brighton Nuwagira, Barış Coşkunuzer, Tolga Birdal 10/14/2025 arxiv

machine learning

We present CuMPerLay, a novel differentiable vectorization layer that enables the integration of Cubical Multiparameter Persistence (CMP) into deep learning pipelines. While CMP presents a natural and powerful way to topologically work with images, its use is hindered by the complexity of multifiltr...

Keywords: Cubical Multiparameter Persistence, CuMPerLay, topological data analysis, differentiable vectorization, bifiltration, Wasserstein stability, Swin Transformer, medical imaging

Long Cui, Weiyun Wang, Jie Shao, Zichen Wen, Gen Luo, Linfeng Zhang, Yanting Zhang, Yu Qiao, Wenhai Wang 10/14/2025 arxiv

machine learning

Existing Multimodal Large Language Models (MLLMs) suffer from increased inference costs due to the additional vision tokens introduced by image inputs. In this work, we propose Visual Consistency Learning (ViCO), a novel training algorithm that enables the model to represent images of varying semant...

Keywords: ViCO, Visual Consistency Learning, ViR, Visual Resolution Router, dynamic tokens, semantic complexity, MLP connector, KL divergence

Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale 10/14/2025 arxiv

machine learning

Although recent advances in visual generation have been remarkable, most existing architectures still depend on distinct encoders for images and text. This separation constrains diffusion models' ability to perform cross-modal reasoning and knowledge transfer. Prior attempts to bridge this gap often...

Keywords: UniFusion, Layerwise Attention Pooling, LAP, VERIFI, frozen VLM, vision-language model, diffusion, DiT

Marco Del Tredici, Jacob McCarran, Benjamin Breen, Javier Aspuru Mijares, Weichen Winston Yin, Jacob M. Taylor, Frank Koppens, Dirk Englund 10/14/2025 arxiv

machine learning

We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof gene...

Keywords: Ax-Prover, theorem proving, Lean, Model Context Protocol, LLMs, multi-agent, formal verification, automated theorem proving

Felix Taubner, Ruihang Zhang, Mathieu Tuli, Sherwin Bahmani, David B. Lindell 10/14/2025 arxiv

computer vision

Digital human avatars aim to simulate the dynamic appearance of humans in virtual environments, enabling immersive experiences across gaming, film, virtual reality, and more. However, the conventional process for creating and animating photorealistic human avatars is expensive and time-consuming, re...

Keywords: multi-view, video diffusion, 4D avatar, animatable avatars, single-image capture, neural rendering, distillation, temporal consistency

Weiyang Jin, Yuwei Niu, Jiaqi Liao, Chengqi Duan, Aoxue Li, Shenghua Gao, Xihui Liu 10/14/2025 arxiv

multimodal learning

Recently, remarkable progress has been made in Unified Multimodal Models (UMMs), which integrate vision-language generation and understanding capabilities within a single framework. However, a significant gap exists where a model's strong visual understanding often fails to transfer to its visual ge...

Keywords: SRUM, self-rewarding, unified multimodal models, global-local dual reward, post-training, understanding-as-evaluator, T2I-CompBench, T2I-ReasonBench

Ahmed Heakl, Martin Gubri, Salman Khan, Sangdoo Yun, Seong Joon Oh 10/14/2025 arxiv

natural language processing

Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need deeper reasoning. Adaptive-depth methods can improve efficiency, but prior approaches rely on costly inferen...

Keywords: dynamic routing, layer skipping, LLM efficiency, Monte Carlo Tree Search, adaptive-depth, windowed pooling, focal loss, bottleneck MLP
Loading...

Preparing your export...