Paper Archive

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

0

9.0/10

Haochen Wang, Yuhao Wang, Tao Zhang, Yikang Zhou, Yanwei Li, Jiacong Wang, Ye Tian, Jiahao Meng, Zilong Huang, Guangcan Mai, Anran Wang, Yunhai Tong, Zhuochen Wang, Xiangtai Li, Zhaoxiang Zhang 10/21/2025 arxiv

machine learning

While Multimodal Large Language Models (MLLMs) excel at holistic understanding, they struggle in capturing the dense world with complex scenes, requiring fine-grained analysis of intricate details and object inter-relationships. Region-level MLLMs have been a promising step. However, previous attemp...

Keywords: Grasp Any Region, GAR, RoI-aligned feature replay, region-level MLLM, multimodal LLM, compositional reasoning, GAR-Bench, DLC-Bench

View Paper

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

0

9.0/10

Howard Chen, Noam Razin, Karthik Narasimhan, Danqi Chen 10/21/2025 arxiv

machine learning

Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines for mitigating this phenomenon, we systematically compare the forgetting patter...

Keywords: catastrophic_forgetting, on-policy, reinforcement_learning, supervised_fine_tuning, Llama, Qwen, continual_learning, mode-seeking

View Paper

How Do LLMs Use Their Depth?

0

9.0/10

Akshat Gupta, Jay Yeung, Gopala Anumanchipalli, Anna Ivanova 10/21/2025 arxiv

machine learning

Growing evidence suggests that large language models do not use their depth uniformly, yet we still lack a fine-grained understanding of their layer-wise prediction dynamics. In this paper, we trace the intermediate representations of several open-weight models during inference and reveal a structur...

Keywords: Guess-then-Refine, layer-wise analysis, LLMs, frequency bias, transformer, interpretability, early exit, open-weight models

View Paper

LightMem: Lightweight and Efficient Memory-Augmented Generation

0

9.0/10

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, Ningyu Zhang 10/21/2025 arxiv

machine learning

Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and ...

Keywords: LightMem, memory-augmented generation, Atkinson-Shiffrin, sensory memory, short-term memory, long-term memory, sleep-time update, LongMemEval

View Paper

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

0

9.0/10

Ling Team, Anqi Shen, Baihui Li, Bin Hu, Bin Jing, Cai Chen, Chao Huang, Chao Zhang, Chaokun Yang, Cheng Lin, Chengyao Wen, Congqi Li, Deng Zhao, Dingbo Yuan, Donghai You, Fagui Mao, Fanzhuang Meng, Feng Xu, Guojie Li, Guowei Wang, Hao Dai, Haonan Zheng, Hong Liu, Jia Guo, Jiaming Liu, Jian Liu, Jianhao Fu, Jiannan Shi, Jianwen Wang, Jianxin Lai, Jin Yang, Jun Mei, Jun Zhou, Junbo Zhao, Junping Zhao, Kuan Xu, Le Su, Lei Chen, Li Tang, Liang Jiang, Liangcheng Fu, Lianhao Xu, Linfeng Shi, Lisha Liao, Longfei Zheng, Meng Li, Mingchun Chen, Qi Zuo, Qiang Cheng, Qianggang Cao, Qitao Shi, Quanrui Guo, Senlin Zhu, Shaofei Wang, Shaomian Zheng, Shuaicheng Li, Shuwei Gu, Siba Chen, Tao Wu, Tao Zhang, Tianyu Zhang, Tianyu Zhou, Tiwei Bie, Tongkai Yang, Wang Hong, Wang Ren, Weihua Chen, Wenbo Yu, Wengang Zheng, Xiangchun Wang, Xiaodong Yan, Xiaopei Wan, Xin Zhao, Xinyu Kong, Xinyu Tang, Xudong Han, Xudong Wang, Xuemin Yang, Xueyu Hu, Yalin Zhang, Yan Sun, Yicheng Shan, Yilong Wang, Yingying Xu, Yongkang Liu, Yongzhen Guo, Yuanyuan Wang, Yuchen Yan, Yuefan Wang, Yuhong Guo, Zehuan Li, Zhankai Xu, Zhe Li, Zhenduo Zhang, Zhengke Gui, Zhenxuan Pan, Zhenyu Huang, Zhenzhong Lan, Zhiqiang Ding, Zhiqiang Zhang, Zhixun Li, Zhizhen Liu, Zihao Wang, Zujie Wen 10/21/2025 arxiv

reinforcement learning

We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including trai...

Keywords: Ring-1T, trillion-parameter, Mixture-of-Experts, IcePop, C3PO++, ASystem, reinforcement learning, rollouts

View Paper

DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution

0

9.0/10

Rongyuan Wu, Lingchen Sun, Zhengqiang Zhang, Shihao Wang, Tianhe Wu, Qiaosi Yi, Shuai Li, Lei Zhang 10/21/2025 arxiv

computer vision

Benefiting from pre-trained text-to-image (T2I) diffusion models, real-world image super-resolution (Real-ISR) methods can synthesize rich and realistic details. However, due to the inherent stochasticity of T2I models, different noise inputs often lead to outputs with varying perceptual quality. Al...

Keywords: Real-world image super-resolution, DP2O-SR, perceptual preference optimization, text-to-image diffusion, flow-based T2I, IQA, no-reference IQA, full-reference IQA

View Paper

Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning

0

9.0/10

Chenghao Zhu, Meiling Tao, Tiannan Wang, Dongyi Ding, Yuchen Eleanor Jiang, Wangchunshu Zhou 10/21/2025 arxiv

natural language processing

Faithfully personalizing large language models (LLMs) to align with individual user preferences is a critical but challenging task. While supervised fine-tuning (SFT) quickly reaches a performance plateau, standard reinforcement learning from human feedback (RLHF) also struggles with the nuances of ...

Keywords: personalization, RLHF, generative reward model, critique-post-edit, reward hacking, Qwen2.5, PPO, GPT-4.1

View Paper

MADR: MPC-guided Adversarial DeepReach

0

9.0/10

Ryan Teoh, Sander Tonkens, William Sharpless, Aijia Yang, Zeyuan Feng, Somil Bansal, Sylvia Herbert 10/21/2025 arxiv

robotics

Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccur...

Keywords: MADR, Hamilton-Jacobi Reachability, MPC-guided learning, adversarial deep learning, differential games, safe control, robotics

View Paper

See the Text: From Tokenization to Visual Reading

0

9.0/10

Ling Xing, Alex Jinpeng Wang, Rui Yan, Hongyu Qu, Zechao Li, Jinhui Tang 10/21/2025 arxiv

machine learning

People see text. Humans read by recognizing words as visual objects, including their shapes, layouts, and patterns, before connecting them to meaning, which enables us to handle typos, distorted fonts, and various scripts effectively. Modern large language models (LLMs), however, rely on subword tok...

Keywords: SeeTok, visual-text, multimodal LLMs, tokenization, OCR, text-vision alignment, cross-lingual, efficiency

View Paper

FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning

0

9.0/10

Yubin Zheng, Pak-Hei Yeung, Jing Xia, Tianjie Ju, Peng Tang, Weidong Qiu, Jagath C. Rajapakse 10/21/2025 arxiv

machine learning

Federated learning (FL) enables multiple clients to collaboratively train machine learning models without exposing local data, balancing performance and privacy. However, domain shift and label heterogeneity across clients often hinder the generalization of the aggregated global model. Recently, lar...

Keywords: FedDEAP, federated learning, CLIP, prompt tuning, dual-prompt, domain adaptation, semantic-disentanglement, personalization

View Paper

Export Archive Data

Browse by Date

Papers for October 22, 2025

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

How Do LLMs Use Their Depth?

LightMem: Lightweight and Efficient Memory-Augmented Generation

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution

Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning

MADR: MPC-guided Adversarial DeepReach

See the Text: From Tokenization to Visual Reading

FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning