Paper Archive

Nucleus-Image: Sparse MoE for Image Generation

0

10.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/14/2026 huggingface

computer vision

We present Nucleus-Image, a text-to-image generation model that establishes a new Pareto frontier in quality-versus-efficiency by matching or exceeding leading models on GenEval, DPG-Bench, and OneIG-Bench while activating only approximately 2B parameters per forward pass. Nucleus-Image employs a sp...

Keywords: sparse MoE, diffusion transformer, Expert-Choice Routing, decoupled routing, timestep modulation, joint attention, progressive curriculum, Muon optimizer

View Paper

Lyra 2.0: Explorable Generative 3D Worlds

0

9.0/10

Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, Sanja Fidler, Jiahui Huang, Huan Ling, Jun Gao, Xuanchi Ren 4/14/2026 arxiv

machine learning

Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative...

Keywords: Lyra 2.0, generative reconstruction, video-to-3D, spatial forgetting, temporal drifting, information routing, self-augmentation, feed-forward reconstruction

View Paper

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

0

9.0/10

Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla 4/14/2026 arxiv

computer vision

Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, a...

Keywords: SceneCritic, SceneOnto, symbolic evaluator, 3D indoor scene synthesis, floor-plan, ontology, LLM, VLM

View Paper

Generative Refinement Networks for Visual Synthesis

0

9.0/10

Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan 4/14/2026 arxiv

computer vision

While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but a...

Keywords: Generative Refinement Networks, Hierarchical Binary Quantization, entropy-guided sampling, autoregressive models, image generation, text-to-image, text-to-video, ImageNet

View Paper

Visual Preference Optimization with Rubric Rewards

0

9.0/10

Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu 4/14/2026 arxiv

machine learning

The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reaso...

Keywords: rDPO, rubric rewards, preference optimization, visual reasoning, on-policy data, reward modeling, vision-language models, DPO

View Paper

Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns

0

9.0/10

Baris Sarper Tezcan, Hrishikesh Viswanath, Rubab Saher, Daniel Aliaga 4/14/2026 arxiv

computer vision

Urban areas are increasingly vulnerable to thermal extremes driven by rapid urbanization and climate change. Traditionally, thermal extremes have been monitored using Earth-observing satellites and numerical modeling frameworks. For example, land surface temperature derived from Landsat or Sentinel ...

Keywords: conflated inverse modeling, diffusion generative model, forward predictive model, urban vegetation, land surface temperature, inverse problem, climate adaptation

View Paper

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

0

9.0/10

Benzhao Tang, Shiyu Yang 4/14/2026 arxiv

machine learning

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byt...

Keywords: CLAD, log anomaly detection, compressed byte streams, dilated convolution, Transformer, mLSTM, four-way pooling, masked pre-training

View Paper

Classical and Quantum Speedups for Non-Convex Optimization via Energy Conserving Descent

0

9.0/10

Yihang Sun, Huaijin Wang, Patrick Hayden, Jose Blanchet 4/14/2026 arxiv

optimization

The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradient descent, appropriately configured ECD dynamics escape strict local minima and converge to a global minimum, making it appealing for machin...

Keywords: Energy Conserving Descent, sECD, qECD, non‑convex optimization, double‑well, Hamiltonian simulation, quantum algorithms, stochastic dynamics

View Paper

Representation geometry shapes task performance in vision-language modeling for CT enterography

0

9.0/10

Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham 4/14/2026 arxiv

machine learning

Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT ente...

Keywords: CT enterography, vision-language, representation geometry, mean pooling, attention pooling, multi-window RGB, multiplanar sampling, retrieval-augmented generation

View Paper

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

0

9.0/10

Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu 4/14/2026 arxiv

machine learning

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains under...

Keywords: GUI grounding, iterative refinement, visual feedback, pixel-precise localization, Computer Use Agents, coding interfaces, GPT-5.4, Claude

View Paper

Export Archive Data

Browse by Date

Papers for April 15, 2026

Nucleus-Image: Sparse MoE for Image Generation

Lyra 2.0: Explorable Generative 3D Worlds

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

Generative Refinement Networks for Visual Synthesis

Visual Preference Optimization with Rubric Rewards

Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

Classical and Quantum Speedups for Non-Convex Optimization via Energy Conserving Descent

Representation geometry shapes task performance in vision-language modeling for CT enterography

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback