Paper Archive

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

0

9.0/10

Unknown authors 10/18/2025 huggingface

machine learning

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). To address the lack of verification signals at test time, prior studies incorporate the training of model's self-verification capabi...

Keywords: LaSeR, RLVR, last-token self-rewarding, self-verification, large language models, reinforcement learning, MSE loss, next-token probability

View Paper

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

0

9.0/10

Unknown authors 10/18/2025 huggingface

reinforcement learning

Large language model (LLM)-based agents are increasingly trained with reinforcement learning (RL) to enhance their ability to interact with external environments through tool use, particularly in search-based settings that require multi-turn reasoning and knowledge acquisition. However, existing app...

Keywords: IGPO, information gain, intrinsic rewards, reinforcement learning, LLM agents, multi-turn reasoning, credit assignment, advantage collapse

View Paper

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

0

9.0/10

Unknown authors 10/18/2025 huggingface

computer vision

In this report, we propose PaddleOCR-VL, a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language mod...

Keywords: PaddleOCR-VL, vision-language, document parsing, NaViT, ERNIE-4.5, multilingual, 0.9B, dynamic-resolution

View Paper

BitNet Distillation

0

9.0/10

Unknown authors 10/18/2025 huggingface

machine learning

In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computat...

Keywords: quantization, ternary, 1.58-bit, LLM, distillation, SubLN, MiniLM, continual pre-training

View Paper

Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

0

9.0/10

Hadi Alzayer, Yunzhi Zhang, Chen Geng, Jia-Bin Huang, Jiajun Wu 10/16/2025 arxiv

computer vision

We present an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models. These models can independently produce high-quality edits for each image in a set of multi-view images of a 3D scene or object, but they do not maintain co...

Keywords: coupled diffusion sampling, multi-view consistency, inference-time, training-free, implicit 3D regularization, pre-trained 2D editors, image editing, computer vision

View Paper

Attention Is All You Need for KV Cache in Diffusion LLMs

0

9.0/10

Quan Nguyen-Tri, Mukul Ranjan, Zhiqiang Shen 10/16/2025 arxiv

natural language processing

This work studies how to adaptively recompute key-value (KV) caches for diffusion large language models (DLMs) to maximize prediction accuracy while minimizing decoding latency. Prior methods' decoders recompute QKV for all tokens at every denoising step and layer, despite KV states changing little ...

Keywords: Elastic-Cache, KV cache, diffusion LLMs, attention-aware drift, depth-aware schedule, LLaDA, GSM8K, HumanEval

View Paper

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

0

9.0/10

Haiwen Diao, Mingxuan Li, Silei Wu, Linjun Dai, Xiaohua Wang, Hanming Deng, Lewei Lu, Dahua Lin, Ziwei Liu 10/16/2025 arxiv

machine learning

The edifice of native Vision-Language Models (VLMs) has emerged as a rising contender to typical modular VLMs, shaped by evolving model architectures and training paradigms. Yet, two lingering clouds cast shadows over its widespread exploration and promotion: (-) What fundamental constraints set nat...

Keywords: native VLMs, vision-language, NEO, pixel-word alignment, monolithic model, multimodal, open-source

View Paper

Terra: Explorable Native 3D World Model with Point Latents

0

9.0/10

Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Xin Tao, Pengfei Wan, Jie Zhou, Jiwen Lu 10/16/2025 arxiv

computer vision

World models have garnered increasing attention for comprehensive modeling of the real world. However, most existing methods still rely on pixel-aligned representations as the basis for world evolution, neglecting the inherent 3D nature of the physical world. This could undermine the 3D consistency ...

Keywords: world model, 3D latent space, point latents, P2G-VAE, SPFlow, Gaussian primitives, multi-view consistency, ScanNet v2

View Paper

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

0

9.0/10

Hansheng Chen, Kai Zhang, Hao Tan, Leonidas Guibas, Gordon Wetzstein, Sai Bi 10/16/2025 arxiv

computer vision

Few-step diffusion or flow-based generative models typically distill a velocity-predicting teacher into a student that predicts a shortcut towards denoised data. This format mismatch has led to complex distillation procedures that often suffer from a quality-diversity trade-off. To address this, we ...

Keywords: pi-Flow, policy-based flow, imitation distillation, few-step generation, flow matching, ODE integration, DiT, ImageNet

View Paper

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

0

8.0/10

Unknown authors 10/18/2025 huggingface

machine learning

Hallucination detection remains a fundamental challenge for the safe and reliable deployment of large language models (LLMs), especially in applications requiring factual accuracy. Existing hallucination benchmarks often operate at the sequence level and are limited to English, lacking the fine-grai...

Keywords: PsiloQA, hallucination detection, span-level annotation, multilingual, GPT-4o, encoder-based models, cross-lingual generalization, automated pipeline

View Paper

Export Archive Data

Browse by Date

Papers for October 18, 2025

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

BitNet Distillation

Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

Attention Is All You Need for KV Cache in Diffusion LLMs

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Terra: Explorable Native 3D World Model with Point Latents

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA