Paper Archive

Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

0

9.0/10

Jisoo Kim, Sangwon Baik, Taeksoo Kim, Sungjoo Kim, Junyoung Lee, Mingi Choi, Hanbyul Joo 6/17/2026 arxiv

computer vision

We present a zero-shot framework for long-horizon dexterous manipulation that grounds language instructions into executable 3D task plans from calibrated multi-view RGB images. Rather than training an end-to-end policy, our system uses a vision-language model (VLM) to produce reference-frame task gr...

Keywords: zero-shot learning, long-horizon manipulation, vision-language model, 3D grounding, dexterous manipulation

View Paper

Native Active Perception as Reasoning for Omni-Modal Understanding

0

9.0/10

Zhenghao Xing, Ruiyang Xu, Yuxuan Wang, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, Pheng-Ann Heng 6/17/2026 arxiv

computer vision

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their...

Keywords: video understanding, active perception, POMDP, Agentic Supervised Fine-Tuning, Agentic Reinforcement Learning, TAURA

View Paper

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

0

9.0/10

Shengyuan Ding, Xilin Wei, Xinyu Fang, Haodong Duan, Dahua Lin, Jiaqi Wang, Yuhang Zang 6/17/2026 arxiv

computer vision

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an ...

Keywords: multimodal large language models, non-Markov games, memory evaluation, RNG-Bench, Qwen3.5-9B

View Paper

Learning User Simulators with Turing Rewards

0

9.0/10

Yingshan Susan Wang, Cedegao E. Zhang, Linlu Qiu, Zexue He, Pengyuan Li, Alex Pentland, Roger P. Levy, Yoon Kim 6/17/2026 arxiv

machine learning

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth respo...

Keywords: Turing-RL, user simulator, reinforcement learning, Turing Test, language model, indistinguishability, conversational chat, Reddit forum discussion

View Paper

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

0

9.0/10

Bhawna Paliwal, Haritheja Etukuru, William Liang, Pieter Abbeel, Nur Muhammad Mahi Shafiullah, Jitendra Malik 6/17/2026 arxiv

computer vision

How can we scalably generate data for robotic manipulation, especially on human-like platforms such as dexterous multi-fingered hands? Learning from human videos has recently emerged as a likely answer to this question. However, difficulties in estimating hand-object interaction and crossing the hum...

Keywords: robotic manipulation, dexterous hands, human-robot interaction, video-based learning, hand-object interaction

View Paper

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

0

9.0/10

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo 6/17/2026 arxiv

machine learning

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and ...

Keywords: cross-matching, machine learning, X-ray sources, optical sources, Chandra, Gaia, LightGBM, NWAY

View Paper

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

0

9.0/10

Siyi Gu, Jialin Chen, Sophia Zhou, Arman Cohan, Rex Ying 6/17/2026 arxiv

machine learning

Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain and may themselves be noisy, incomplete, or partially incorrect; even wh...

Keywords: Rubric-Conditioned Self-Distillation, AI Reasoning, Machine Learning, Distillation, Rubrics, Self-Distillation

View Paper

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

0

9.0/10

Anoushka Vyas, Aarushi Dhanuka, Sina Khoshfetrat Pakazad, Henrik Ohlsson 6/17/2026 arxiv

machine learning

Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present Data Intelligence Agents (DIA), a system of three agents (Data Interpreter, Schema Creator, and Que...

Keywords: Data Intelligence Agents, Autonomous Coding Agents, Data Integration, SQL Benchmarks, Natural Language Instructions

View Paper

Explaining Attention with Program Synthesis

0

9.0/10

Amiri Hayes, Belinda Li, Jacob Andreas 6/17/2026 arxiv

machine learning

A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention hea...

Keywords: attention mechanisms, program synthesis, transformer models, neural networks, interpretable AI

View Paper

NeuMesh++: Towards Versatile and Efficient Volumetric Editing with Disentangled Neural Mesh-based Implicit Field

0

9.0/10

Chong Bao, Yuan Li, Bangbang Yang, Yujun Shen, Hujun Bao, Zhaopeng Cui, Yinda Zhang, Guofeng Zhang 6/17/2026 arxiv

computer vision

Recently neural implicit rendering techniques have evolved rapidly and demonstrated significant advantages in novel view synthesis and 3D scene reconstruction. However, existing neural rendering methods for editing purposes offer limited functionalities, e.g., rigid transformation and category-speci...

Keywords: Neural Rendering, Volumetric Editing, Neural Mesh, Implicit Field, Disentangled Representation, Texture Editing, Semantic Editing

View Paper

Export Archive Data

Browse by Date

Papers for June 18, 2026

Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

Native Active Perception as Reasoning for Omni-Modal Understanding

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Learning User Simulators with Turing Rewards

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

Explaining Attention with Program Synthesis

NeuMesh++: Towards Versatile and Efficient Volumetric Editing with Disentangled Neural Mesh-based Implicit Field