Paper Archive

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

0

9.0/10

Mingwei Xu, Hao Fang 5/7/2026 arxiv

reinforcement learning

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy...

Keywords: reinforcement learning, verifiable rewards, policy optimization, LLM reasoning, positive-only learning, GRPO, PPO, Qwen

View Paper

Relit-LiVE: Relight Video by Jointly Learning Environment Video

0

9.0/10

Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang 5/7/2026 arxiv

computer vision

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate in...

Keywords: video relighting, neural rendering, environment video, intrinsic decomposition, diffusion models, computer vision, 3D reconstruction

View Paper

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

0

9.0/10

Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng 5/7/2026 arxiv

machine learning

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, re...

Keywords: Mixture-of-Experts, MoE, UniPool, expert sharing, efficient architectures, large language models, transformers

View Paper

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

0

9.0/10

Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet 5/7/2026 arxiv

computer vision

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new s...

Keywords: video generation, zero-shot learning, camera control, motion transfer, diffusion models, pose conditioning, depth estimation, cinematography

View Paper

EMO: Pretraining Mixture of Experts for Emergent Modularity

0

9.0/10

Ryan Wang, Akshita Bhagia, Sewon Min 5/7/2026 arxiv

natural language processing

Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset...

Keywords: Mixture-of-Experts, Modularity, Emergent Specialization, Memory Efficiency, Large Language Models, Pretraining

View Paper

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

natural language processing

With the rapid evolution of foundation models, Large Language Model (LLM) agents have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces significant security risks, as malicious actors can manipulate agents into executing tools to generate harmful content....

Keywords: gpt

View Paper

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

reinforcement learning

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies var...

Keywords: neural network, detection, classification

View Paper

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

natural language processing

Linear Attention (LA) offers a promising paradigm for scaling large language models (LLMs) to long sequences by avoiding the quadratic complexity of self-attention. Recent LA models such as Mamba2 and GDN interpret linear recurrences as closed-form online stochastic gradient descent (SGD), but naive...

Keywords: transformer, attention, gradient descent

View Paper

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

computer vision

Recent advances in generative video models are increasingly driven by post-training and test-time scaling, both of which critically depend on the quality of video reward models (RMs). An ideal reward model should predict accurate rewards that align with human preferences across diverse scenarios. Ho...

Keywords: reinforcement learning, regression

View Paper

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

computer vision

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple re...

Keywords: diffusion model, reinforcement learning, fine-tuning

View Paper

Export Archive Data

Browse by Date

Papers for May 9, 2026

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

Relit-LiVE: Relight Video by Jointly Learning Environment Video

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

EMO: Pretraining Mixture of Experts for Emergent Modularity

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling

MARBLE: Multi-Aspect Reward Balance for Diffusion RL