Paper Archive

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

computer vision

This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive me...

Keywords: flow matching, LeapAlign, post-training fine-tuning, direct-gradient, ODE sampling, trajectory shortening, gradient stabilization, image-text alignment

View Paper

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities...

Keywords: RAD-2, diffusion models, generator-discriminator, reinforcement learning, BEV-Warp, trajectory planning, autonomous driving, closed-loop evaluation

View Paper

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inferenc...

Keywords: 3D Gaussian Splatting, global scene tokens, novel-view synthesis, RealEstate10K, ACID, coarse-to-fine training, compact 3D representation, real-time rendering

View Paper

R3D: Revisiting 3D Policy Learning

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

robotics

3D policy learning promises superior generalization and cross-embodiment transfer, but progress has been hindered by training instabilities and severe overfitting, precluding the adoption of powerful 3D perception models. In this work, we systematically diagnose these failures, identifying the omiss...

Keywords: 3D policy learning, transformer, diffusion decoder, imitation learning, 3D data augmentation, batch normalization, robotics, manipulation

View Paper

SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Reliable uncertainty estimation is critical for medical image segmentation, where automated contours feed downstream quantification and clinical decision support. Many strong uncertainty methods require repeated inference, while efficient single-forward-pass alternatives often provide weaker failure...

Keywords: medical image segmentation, uncertainty estimation, perturbation energy, single-forward-pass, rank-1 posterior probes, calibration, error ranking, AUROC

View Paper

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to insp...

Keywords: RadAgent, vision-language model, tool-using agent, chest CT, interpretability, faithfulness, robustness, medical AI

View Paper

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

robotics

Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque abo...

Keywords: OpenMobile, task synthesis, trajectory synthesis, policy-switching, vision-language models, AndroidWorld, Qwen3-VL, open-source dataset

View Paper

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control ...

Keywords: video-to-audio, multimodal, controllability, CLIP, temporal-timbre decoupling, REPA, VGGSound-TVC, audio generation

View Paper

Autogenesis: A Self-Evolving Agent Protocol

0

9.0/10

[object Object] 4/16/2026 huggingface

machine learning

Recent advances in LLM based agent systems have shown promise in tackling complex, long horizon tasks. However, existing agent protocols (e.g., A2A and MCP) under specify cross entity lifecycle and context management, version tracking, and evolution safe update interfaces, which encourages monolithi...

Keywords: Autogenesis, AGP, RSPL, SEPL, AGS, self-evolving agents, multi-agent system, resource protocol

View Paper

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitati...

Keywords: UniDoc-RL, Retrieval-Augmented Generation, LVLM, reinforcement learning, hierarchical actions, dense rewards, Group Relative Policy Optimization, active visual perception

View Paper

Export Archive Data

Browse by Date

Papers for April 19, 2026

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

R3D: Revisiting 3D Policy Learning

SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

Autogenesis: A Self-Evolving Agent Protocol

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards