Paper Archive

WavFlow: Audio Generation in Waveform Space

0

9.0/10

Feiyan Zhou, Luyuan Wang, Shoufa Chen, Zhe Wang, Zhiheng Liu, Yuren Cong, Xiaohui Zhang, Fanny Yang, Belinda Zeng 5/18/2026 arxiv

machine learning

Modern audio generation predominantly relies on latent-space compression, introducing additional complexity and potential information loss. In this work, we challenge this paradigm with WavFlow, a framework that generates high-fidelity audio directly in raw waveform space without intermediate repres...

Keywords: audio generation, waveform space, flow matching, multimodal, video-to-audio, text-to-audio

View Paper

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

0

9.0/10

Yuxiang Huang, Nuno M. T. Gonçalves, Federico Alvetreti, Lei Li, Xu Han, Edoardo M. Ponti, André F. T. Martins, Marcos V. Treviso 5/18/2026 arxiv

natural language processing

Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of relevant tokens for any ...

Keywords: hierarchical attention, sparse attention, long-context modeling, differentiable attention, LLM efficiency, α-entmax, Triton implementation

View Paper

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

0

9.0/10

Yukang Chen, Luozhou Wang, Wei Huang, Shuai Yang, Bohan Zhang, Yicheng Xiao, Ruihang Chu, Weian Mao, Qixin Hu, Shaoteng Liu, Yuyang Zhao, Huizi Mao, Ying-Cong Chen, Enze Xie, Xiaojuan Qi, Song Han 5/18/2026 arxiv

computer vision

We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-desi...

Keywords: long video generation, NVFP4, sequence-parallel, autoregressive training, diffusion models, Blackwell GPU, video inference, VAE decoding

View Paper

Aurora: Unified Video Editing with a Tool-Using Agent

0

9.0/10

Yongsheng Yu, Ziyun Zeng, Zhiyuan Xiao, Zhenghong Zhou, Hang Hua, Wei Xiong, Jiebo Luo 5/18/2026 arxiv

machine learning

Recent video editing models have converged on a unified conditioning design: a single diffusion transformer jointly consumes text, source video, and reference images, and one set of weights covers replacement, removal, style transfer, and reference-driven insertion. The design is flexible, but it as...

Keywords: video editing, vision-language model, agent framework, diffusion transformer, tool-using agent, underspecification

View Paper

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

0

9.0/10

Qianhao Yuan, Jie Lou, Xing Yu, Hongyu Lin, Le Sun, Xianpei Han, Yaojie Lu 5/18/2026 arxiv

computer vision

Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but decisive evidence in the full image. We observe a regional-to-global perception gap: the same MLLM answers fine-grained questions more accurately when conditioned o...

Keywords: multimodal LLMs, fine-grained visual understanding, self-distillation, regional perception, Vision-OPD

View Paper

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/18/2026 huggingface

computer vision

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo...

Keywords: diffusion model, reinforcement learning, fine-tuning, detection

View Paper

POST: Prior-Observation Adversarial Learning of Spatio-Temporal Associations for Multivariate Time Series Anomaly Detection

0

5.0/10

[object Object], [object Object], [object Object] 5/18/2026 huggingface

machine learning

Existing Multivariate Time Series Anomaly Detection (MTSAD) frameworks increasingly rely on integrating Graph Neural Networks (GNNs) with sequence models to capture complex spatio-temporal dependencies. However, less attention is paid to the spatial over-generalization problem, where unconstrained s...

Keywords: neural network, attention, detection

View Paper

PySIFT: GPU-Resident Deterministic SIFT for Deep Learning Vision Pipelines

0

5.0/10

[object Object], [object Object], [object Object] 5/18/2026 huggingface

computer vision

A widespread assumption in local feature research holds that classical handcrafted descriptors are accuracy-limited relics best replaced by learned alternatives. We show this is wrong. Through an 8-configuration ablation spanning four benchmarks (HPatches, ROxford5K, IMC Phototourism, MegaDepth), we...

Keywords: deep learning, detection

View Paper

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/17/2026 huggingface

computer vision

Large Reasoning Models (LRMs) achieve strong performance by generating long chains of thought (CoT), but often overthink, continuing to reason after a solution has already stabilized and thereby wasting tokens and increasing latency. Existing inference-time early-exit methods rely primarily on answe...

View Paper

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/18/2026 huggingface

reinforcement learning

Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly re...

Keywords: reinforcement learning

View Paper

Export Archive Data

Browse by Date

Papers for May 19, 2026

WavFlow: Audio Generation in Waveform Space

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Aurora: Unified Video Editing with a Tool-Using Agent

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

POST: Prior-Observation Adversarial Learning of Spatio-Temporal Associations for Multivariate Time Series Anomaly Detection

PySIFT: GPU-Resident Deterministic SIFT for Deep Learning Vision Pipelines

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL