Paper Archive

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

computer vision

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For ...

Keywords: diffusion model, fine-tuning

View Paper

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

computer vision

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the a...

Keywords: reinforcement learning

View Paper

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

reinforcement learning

Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in fu...

Keywords: diffusion model

View Paper

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

0

5.0/10

[object Object] 5/6/2026 huggingface

natural language processing

Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipelines. This paper treats chord generatio...

Keywords: transformer, fine-tuning

View Paper

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

natural language processing

Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce CoR...

View Paper

Advancing Aesthetic Image Generation via Composition Transfer

0

5.0/10

[object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

computer vision

Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or...

Keywords: diffusion model, fine-tuning

View Paper

Lightning Unified Video Editing via In-Context Sparse Attention

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

computer vision

Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing....

Keywords: attention

View Paper

Stream-T1: Test-Time Scaling for Streaming Video Generation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

computer vision

While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structura...

Keywords: diffusion model

View Paper

StableI2I: Spotting Unintended Changes in Image-to-Image Transition

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/6/2026 huggingface

computer vision

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure...

View Paper

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/5/2026 huggingface

computer vision

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception...

Keywords: transformer

View Paper

Export Archive Data

Browse by Date

Papers for May 7, 2026

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Advancing Aesthetic Image Generation via Composition Transfer

Lightning Unified Video Editing via In-Context Sparse Attention

Stream-T1: Test-Time Scaling for Streaming Video Generation

StableI2I: Spotting Unintended Changes in Image-to-Image Transition

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation