Paper Archive

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

computer vision

This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive me...

Keywords: flow matching, LeapAlign, direct-gradient, GRPO, fine-tuning, trajectory shortening, ODE sampling, image-text alignment

View Paper

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities...

Keywords: diffusion-based planning, generator-discriminator, reinforcement learning, Temporally Consistent Group Relative Policy Optimization, On-policy Generator Optimization, BEV-Warp, closed-loop planning, motion planning

View Paper

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control ...

Keywords: video-to-audio, V2A, multimodal, CLIP, temporal-timbre decoupling, REPA, modality dropout, VGGSound-TVC

View Paper

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitati...

Keywords: UniDoc-RL, RAG, LVLM, hierarchical actions, dense rewards, GRPO, visual retrieval, active perception

View Paper

Learning to Concatenate Quantum Codes

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Concatenating quantum error correction codes scales error correction capability by driving logical error rates down double-exponentially across levels. However, the noise structure shifts under concatenation, making it hard to choose an optimal code sequence. We automate this choice by estimating th...

Keywords: concatenated quantum codes, quantum error correction, logical error rate, noise estimation, learning-based encoders, non-additive encoders, stabilizer codes, resource reduction

View Paper

CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/16/2026 huggingface

machine learning

Scientific discovery in digital health requires converting continuous physiological signals from wearable devices into clinically actionable biomarkers. We introduce CoDaS (AI Co-Data-Scientist), a multi-agent system that structures biomarker discovery as an iterative process combining hypothesis ge...

Keywords: CoDaS, digital biomarkers, wearable sensors, circadian instability, depression, insulin resistance, adversarial validation, multi-agent system

View Paper

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

0

9.0/10

[object Object] 4/16/2026 huggingface

machine learning

Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to assign different reasoning strategies to different voters. The approach, Diverse Prompt Mixer, is tested on the AIMO 3 competition: 3 models, 23+ exp...

Keywords: LLMs, majority voting, prompt engineering, Diverse Prompt Mixer, AIMO 3, IMO-level problems, selection loss, high-temperature sampling

View Paper

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

Synthetic data is a standard component in training large language models, yet systematic comparisons across design dimensions, including rephrasing strategy, generator model, and source data, remain absent. We conduct extensive controlled experiments, generating over one trillion tokens, to identify...

Keywords: synthetic data, pretraining, prompt design, rephrasing, FinePhrase, structured outputs, generator size, dataset release

View Paper

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

computer vision

We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view ...

Keywords: HY-World 2.0, 3D Gaussian Splatting, HY-Pano 2.0, WorldNav, WorldStereo 2.0, WorldMirror 2.0, WorldLens, multi-modal

View Paper

ROSE: Retrieval-Oriented Segmentation Enhancement

0

9.0/10

[object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

computer vision

Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses ...

Keywords: ROSE, NEST, MLLM, segmentation, retrieval-augmented, visual prompt, WebSense, gIoU

View Paper

Export Archive Data

Browse by Date

Papers for April 17, 2026

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Learning to Concatenate Quantum Codes

CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

ROSE: Retrieval-Oriented Segmentation Enhancement