Paper Archive

Perceptual Flow Network for Visually Grounded Reasoning

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

machine learning

Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervis...

Keywords: PFlowNet, Perceptual Flow Network, variational reinforcement learning, visual reasoning, LVLMs, geometric priors, self-conditioned generation, V* Bench

View Paper

AcademiClaw: When Students Set Challenges for AI Agents

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

machine learning

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' rea...

Keywords: AcademiClaw, OpenClaw, benchmark, academic tasks, long‑horizon, Docker sandbox, bilingual, CUDA

View Paper

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

machine learning

We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent witho...

Keywords: PhysicianBench, EHR, LLM agents, clinical workflows, execution-grounded benchmark, long-horizon tasks, healthcare AI

View Paper

Generative Modeling with Orbit-Space Particle Flow Matching

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

machine learning

We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yi...

Keywords: OGPP, orbit-space, flow matching, particle systems, geometric probability paths, arc-length-aware terminal velocity, surface normals, ShapeNet

View Paper

T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

reinforcement learning

Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads ...

Keywords: T^2PO, uncertainty-guided exploration, multi-turn reinforcement learning, token-level intervention, turn-level resampling, training stability, WebShop, ALFWorld

View Paper

Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition

0

9.0/10

[object Object], [object Object] 5/3/2026 huggingface

machine learning

We present Coopetition-Gym v1, a benchmark platform for mixed-motive multi-agent reinforcement learning under strategic coopetition. The platform comprises twenty environments organized into four mechanism classes that correspond to four foundational technical reports: interdependence and complement...

Keywords: coopetition, multi-agent reinforcement learning, mixed-motive, benchmark, reward mutuality, interdependence matrix, Gymnasium, PettingZoo

View Paper

On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length

0

8.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

reinforcement learning

Large language models (LLMs) have shown promise as interactive agents that solve tasks through extended sequences of environment interactions. While prior work has primarily focused on system-level optimizations or algorithmic improvements, the role of task horizon length in shaping training dynamic...

Keywords: large language models, horizon length, long-horizon tasks, horizon reduction, horizon generalization, training instability, exploration, credit assignment

View Paper

MolmoAct2: Action Reasoning Models for Real-world Deployment

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

machine learning

Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay pr...

View Paper

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/4/2026 huggingface

machine learning

Videos are unique in their ability to capture actions which transcend multiple frames. Accordingly, for many years action recognition was the quintessential task for video understanding. Unfortunately, due to a lack of sufficiently diverse and challenging data, modern vision-language models (VLMs) a...

View Paper

Counting as a minimal probe of language model reliability

0

5.0/10

[object Object], [object Object] 5/3/2026 huggingface

natural language processing

Large language models perform strongly on benchmarks in mathematical reasoning, coding and document analysis, suggesting a broad ability to follow instructions. However, it remains unclear whether such success reflects general logical competence, repeated application of learned procedures, or patter...

View Paper

Export Archive Data

Browse by Date

Papers for May 5, 2026

Perceptual Flow Network for Visually Grounded Reasoning

AcademiClaw: When Students Set Challenges for AI Agents

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

Generative Modeling with Orbit-Space Particle Flow Matching

T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition

On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length

MolmoAct2: Action Reasoning Models for Real-world Deployment

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

Counting as a minimal probe of language model reliability