Paper Archive

From Pixels to Words -- Towards Native One-Vision Models at Scale

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

computer vision

Current vision-language models (VLMs) typically stitch together separate image encoders and language decoders via multi-stage alignment, a modular framework that inevitably fragments pixel-level signals across frames and scatters early pixel-word interactions. In parallel, native VLMs, despite impre...

View Paper

PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

natural language processing

Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT should be assessed through the stability-plasticity dilemma:...

View Paper

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

natural language processing

World models for interactive video generation have largely focused on single-agent settings, where future observations are generated from a single control signal. However, many generated environments require multi-agent interaction: multiple players, robots, or embodied agents act simultaneously wit...

Keywords: attention

View Paper

Self-Improving Language Models with Bidirectional Evolutionary Search

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

natural language processing

Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse veri...

View Paper

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

computer vision

Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which leverages verifier-generated rationales rather than decisio...

Keywords: reinforcement learning

View Paper

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

0

5.0/10

[object Object], [object Object], [object Object] 5/27/2026 huggingface

computer vision

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific fail...

View Paper

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

computer vision

Vision-language models with extended reasoning succeed on complex problems, but many real-world problems require external tools that internal reasoning alone often cannot resolve. Agentic reasoning therefore interleaves two behaviors with a structural asymmetry: thinking (the self-contained default)...

View Paper

Rethinking Memory as Continuously Evolving Connectivity

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

natural language processing

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and...

View Paper

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

natural language processing

Interactive 3D assets used in games and simulation are typically decomposed into specific semantic parts to support animation, physics, and scripted behaviors, yet most generative 3D models produce either monolithic meshes or arbitrary part decompositions that cannot be aligned with application-spec...

View Paper

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/27/2026 huggingface

natural language processing

Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work...

View Paper

Export Archive Data

Browse by Date

Papers for May 28, 2026

From Pixels to Words -- Towards Native One-Vision Models at Scale

PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

Self-Improving Language Models with Bidirectional Evolutionary Search

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Rethinking Memory as Continuously Evolving Connectivity

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems