Paper Archive

ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

0

9.0/10

Zixin Yin, Ling-Hao Chen, Lionel Ni, Xili Dai 10/20/2025 arxiv

computer vision

Recent advances in training-free attention control methods have enabled flexible and efficient text-guided editing capabilities for existing generation models. However, current approaches struggle to simultaneously deliver strong editing strength while preserving consistency with the source. This li...

Keywords: ConsistEdit, MM-DiT, attention control, vision-only attention, mask-guided fusion, query-key-value manipulation, training-free, image editing

View Paper

Unbiased Gradient Low-Rank Projection

0

9.0/10

Rui Pan, Yang Luo, Yuxing Liu, Yang You, Tong Zhang 10/20/2025 arxiv

machine learning

Memory-efficient optimization is critical for training increasingly large language models (LLMs). A popular strategy involves gradient low-rank projection, storing only the projected optimizer states, with GaLore being a representative example. However, a significant drawback of many such methods is...

Keywords: low-rank projection, GaLore, Muon, GUM, layerwise sampling, unbiased optimizer, convergence guarantees, LLM fine-tuning

View Paper

Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain

0

9.0/10

Yulin Luo, Chun-Kai Fan, Menghang Dong, Jiayu Shi, Mengdi Zhao, Bo-Wen Zhang, Cheng Chi, Jiaming Liu, Gaole Dai, Rongyu Zhang, Ruichuan An, Kun Wu, Zhengping Che, Shaoxuan Xie, Guocai Yao, Zhongxia Zhao, Pengwei Wang, Guang Liu, Zhongyuan Wang, Tiejun Huang, Shanghang Zhang 10/20/2025 arxiv

robotics

Building robots that can perceive, reason, and act in dynamic, unstructured environments remains a core challenge. Recent embodied systems often adopt a dual-system paradigm, where System 2 handles high-level reasoning while System 1 executes low-level control. In this work, we refer to System 2 as ...

Keywords: RoboBench, multimodal LLM, embodied AI, benchmark, robotics, affordance, planning, perception reasoning

View Paper

Glyph: Scaling Context Windows via Visual-Text Compression

0

9.0/10

Jiale Cheng, Yusen Liu, Xinyu Zhang, Yulin Fei, Wenyi Hong, Ruiliang Lyu, Weihan Wang, Zhe Su, Xiaotao Gu, Xiao Liu, Yushi Bai, Jie Tang, Hongning Wang, Minlie Huang 10/20/2025 arxiv

machine learning

Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-...

Keywords: Glyph, visual-text compression, vision-language models, long-context LLMs, LLM-driven genetic search, token compression, document understanding

View Paper

Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics

0

9.0/10

Akshara Prabhakar, Roshan Ram, Zixiang Chen, Silvio Savarese, Frank Wang, Caiming Xiong, Huan Wang, Weiran Yao 10/20/2025 arxiv

machine learning

As information grows exponentially, enterprises face increasing pressure to transform unstructured data into coherent, actionable insights. While autonomous agents show promise, they often struggle with domain-specific nuances, intent alignment, and enterprise integration. We present Enterprise Deep...

Keywords: EDR, multi-agent system, enterprise analytics, Master Planning Agent, reflection mechanism, NL2SQL, visualization agent, DeepResearch Bench

View Paper

Executable Knowledge Graphs for Replicating AI Research

0

9.0/10

Yujie Luo, Zhuoyun Yu, Xuehai Wang, Yuqi Zhu, Ningyu Zhang, Lanning Wei, Lun Du, Da Zheng, Huajun Chen 10/20/2025 arxiv

machine learning

Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to captu...

Keywords: Executable Knowledge Graphs, xKG, reproducibility, PaperBench, RAG, knowledge graph, code extraction, LLM agents

View Paper

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

0

9.0/10

Austin Xu, Xuan-Phi Nguyen, Yilun Zhou, Chien-Sheng Wu, Caiming Xiong, Shafiq Joty 10/20/2025 arxiv

machine learning

Finetuning specialized generative evaluators has emerged as a popular paradigm to meet the increasing demand for scalable evaluation during both training and test-time. However, recent work has largely focused on applying new methodology, such as reinforcement learning (RL), to training evaluators, ...

Keywords: foundational_evaluators, FARE, evaluation, reasoning, supervised_finetuning, rejection_sampling, MATH, RL_verification

View Paper

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

0

9.0/10

Yuhao Yang, Zhen Yang, Zi-Yi Dou, Anh Nguyen, Keen You, Omar Attia, Andrew Szot, Michael Feng, Ram Ramrakhya, Alexander Toshev, Chao Huang, Yinfei Yang, Zhe Gan 10/20/2025 arxiv

machine learning

Multimodal agents for computer use rely exclusively on primitive actions (click, type, scroll) that require accurate visual grounding and lengthy execution chains, leading to cascading failures and performance bottlenecks. While other agents leverage rich programmatic interfaces (APIs, MCP servers, ...

Keywords: Hybrid Action, Computer Use Agents, GUI primitives, Programmatic Tools, Synthetic Data, Online Reinforcement Learning, Supervised Fine-tuning, OSWorld

View Paper

Mapping Post-Training Forgetting in Language Models at Scale

0

9.0/10

Jackson Harmon, Andreas Hochlehnert, Matthias Bethge, Ameya Prabhu 10/20/2025 arxiv

natural language processing

Scaled post-training now drives many of the largest capability gains in language models (LMs), yet its effect on pretrained knowledge remains poorly understood. Not all forgetting is equal: Forgetting one fact (e.g., a U.S. president or an API call) does not "average out" by recalling another. Hence...

Keywords: post-training, forgetting, backward transfer, sample-wise metric, chance-adjusted accuracy, domain-continual pretraining, RL/SFT, instruction tuning

View Paper

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

0

9.0/10

Samir Khaki, Junxian Guo, Jiaming Tang, Shang Yang, Yukang Chen, Konstantinos N. Plataniotis, Yao Lu, Song Han, Zhijian Liu 10/20/2025 arxiv

machine learning

Vision Language Models (VLMs) have rapidly advanced in integrating visual and textual reasoning, powering applications across high-resolution image understanding, long-video analysis, and multi-turn conversation. However, their scalability remains limited by the growing number of visual tokens that ...

Keywords: SparseVILA, visual sparsity, vision-language models, prefill pruning, query-aware retrieval, AWQ, inference acceleration, multimodal

View Paper

Export Archive Data

Browse by Date

Papers for October 21, 2025

ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

Unbiased Gradient Low-Rank Projection

Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain

Glyph: Scaling Context Windows via Visual-Text Compression

Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics

Executable Knowledge Graphs for Replicating AI Research

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Mapping Post-Training Forgetting in Language Models at Scale

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference