Paper Archive

Browse and export your curated research paper collection

238
Archived Days
2368
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for May 9, 2026

10 papers found

Mingwei Xu, Hao Fang 5/7/2026 arxiv

reinforcement learning

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy...

Keywords: reinforcement learning, verifiable rewards, policy optimization, LLM reasoning, positive-only learning, GRPO, PPO, Qwen

Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang 5/7/2026 arxiv

computer vision

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate in...

Keywords: video relighting, neural rendering, environment video, intrinsic decomposition, diffusion models, computer vision, 3D reconstruction

Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng 5/7/2026 arxiv

machine learning

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, re...

Keywords: Mixture-of-Experts, MoE, UniPool, expert sharing, efficient architectures, large language models, transformers

Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet 5/7/2026 arxiv

computer vision

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new s...

Keywords: video generation, zero-shot learning, camera control, motion transfer, diffusion models, pose conditioning, depth estimation, cinematography

Ryan Wang, Akshita Bhagia, Sewon Min 5/7/2026 arxiv

natural language processing

Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset...

Keywords: Mixture-of-Experts, Modularity, Emergent Specialization, Memory Efficiency, Large Language Models, Pretraining

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

natural language processing

With the rapid evolution of foundation models, Large Language Model (LLM) agents have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces significant security risks, as malicious actors can manipulate agents into executing tools to generate harmful content....

Keywords: gpt

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

reinforcement learning

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies var...

Keywords: neural network, detection, classification

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

natural language processing

Linear Attention (LA) offers a promising paradigm for scaling large language models (LLMs) to long sequences by avoiding the quadratic complexity of self-attention. Recent LA models such as Mamba2 and GDN interpret linear recurrences as closed-form online stochastic gradient descent (SGD), but naive...

Keywords: transformer, attention, gradient descent

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

computer vision

Recent advances in generative video models are increasingly driven by post-training and test-time scaling, both of which critically depend on the quality of video reward models (RMs). An ideal reward model should predict accurate rewards that align with human preferences across diverse scenarios. Ho...

Keywords: reinforcement learning, regression

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/7/2026 huggingface

computer vision

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple re...

Keywords: diffusion model, reinforcement learning, fine-tuning
Loading...

Preparing your export...