Paper Archive

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

computer vision

Achieving human-level manipulation requires dexterous robotic hands capable of complex object interactions. Advancing such capabilities further demands standardized benchmarks for systematic evaluation. However, existing dexterous benchmarks lack tasks that reflect the unique manipulation capabiliti...

View Paper

Look Before You Leap: Autonomous Exploration for LLM Agents

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

natural language processing

Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptiv...

Keywords: reinforcement learning

View Paper

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

reinforcement learning

Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs thro...

Keywords: diffusion model

View Paper

WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

computer vision

Aerial vision-language navigation (VLN) requires agents to follow natural-language instructions through closed-loop perception and action in 3D environments. We argue that aerial VLN can be formulated as a prediction-driven world-action problem: the agent should anticipate latent world evolution and...

Keywords: reinforcement learning

View Paper

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

computer vision

Large vision-language models have significantly advanced GUI agents, enabling executable interaction across web, mobile, and desktop interfaces. Yet these gains largely rely on a forgiving region-tolerant paradigm, where many nearby pixels inside the same component remain valid. Precise geometric co...

Keywords: reinforcement learning

View Paper

Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

natural language processing

Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV cache reuse in long-context scenarios such as Retrieval-Augmented Generation (RAG). However, its broader application is hindered by two key challenges: the diff...

Keywords: attention, fine-tuning, segmentation

View Paper

Unlocking Dense Metric Depth Estimation in VLMs

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

computer vision

Vision-Language Models (VLMs) excel at 2D tasks such as grounding and captioning, yet remain limited in 3D understanding. A key limitation is their text-only supervision paradigm, which under-constrains fine-grained visual perception and prevents the recovery of dense geometry. Prior methods either ...

View Paper

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

natural language processing

Toward recursive self-improvement, we investigate LLM agents autonomously designing foundation models beyond standard Transformers. We introduce a dual-framework approach: AIRA-Compose for high-level architecture search, and AIRA-Design for low-level mechanistic implementation. AIRA-Compose uses 11 ...

Keywords: transformer, attention, classification

View Paper

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

reinforcement learning

Recent 3D world modeling systems based on generative scene synthesis, such as Marble, can create coherent and explorable 3D environments, yet their outputs are typically static monolithic assets with limited editability and physical interaction. This restricts their use in immersive content creation...

View Paper

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/15/2026 huggingface

computer vision

Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot support low-latency and interactive garment control, which is crucial for applications such as e-commerce and content creation. This paper studies how to ...

View Paper

Export Archive Data

Browse by Date

Papers for May 18, 2026

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

Look Before You Leap: Autonomous Exploration for LLM Agents

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

Unlocking Dense Metric Depth Estimation in VLMs

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization