Paper Archive

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and sp...

Keywords: attention

View Paper

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evo...

View Paper

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

computer vision

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry ...

Keywords: detection

View Paper

MobileMoE: Scaling On-Device Mixture of Experts

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billio...

Keywords: fine-tuning

View Paper

MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

computer vision

Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we pres...

Keywords: transformer, diffusion model

View Paper

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interacti...

View Paper

QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environments are scored only by game outcomes such as win rates and largely remain to text-only interaction, making it difficul...

View Paper

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and can...

View Paper

Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

reinforcement learning

Agentic reinforcement learning (RL) has proven effective for training LLM-based agents with external tool-use capabilities. However, we identify that agentic RL training induces increasing redundant tool calls and blurs the model's intrinsic knowledge boundary, where the model fails to distinguish w...

Keywords: reinforcement learning

View Paper

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/26/2026 huggingface

natural language processing

Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present...

View Paper

Export Archive Data

Browse by Date

Papers for May 27, 2026

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

MobileMoE: Scaling On-Device Mixture of Experts

MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models