Paper Archive

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

natural language processing

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLM...

View Paper

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

computer vision

Scaling humanoid loco-manipulation requires robot-compatible demonstrations across diverse objects, whole-body motions, and scene geometries, but teleoperation and motion capture are difficult to scale because each collection depends on physical setups, instrumented actors, and robot operation. We p...

View Paper

Streaming Communication in Multi-Agent Reasoning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

reinforcement learning

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent ag...

Keywords: gpt

View Paper

Audio Interaction Model

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

natural language processing

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-deci...

View Paper

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

natural language processing

Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks cannot capture how a model dynamically delivers care across an encounter: gathering information, planning treatment, and adapting longitudinal management across successive patient states. Me...

Keywords: gpt

View Paper

Arithmetic Pedagogy for Language Models

0

5.0/10

[object Object], [object Object] 6/3/2026 huggingface

computer vision

We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a left-to-right procedure aligned with the causal order of token generation...

Keywords: attention, gpt, reinforcement learning

View Paper

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

reinforcement learning

Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent t...

View Paper

M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

machine learning

As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models ret...

Keywords: multi-modal

View Paper

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

natural language processing

Audio generation and audio-to-text understanding remain largely separate, with diffusion models dominating high-fidelity synthesis and autoregressive (AR) language models driving captioning and semantic prediction. Existing unified approaches typically rely on either heterogeneous modules or AR-cent...

Keywords: diffusion model

View Paper

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 6/3/2026 huggingface

reinforcement learning

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hac...

Keywords: reinforcement learning

View Paper

Export Archive Data

Browse by Date

Papers for June 4, 2026

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Streaming Communication in Multi-Agent Reasoning

Audio Interaction Model

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Arithmetic Pedagogy for Language Models

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning