Paper Archive

Browse and export your curated research paper collection

221
Archived Days
2198
Total Papers
8.0
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for April 15, 2026

10 papers found

[object Object], [object Object], [object Object], [object Object], [object Object] 4/14/2026 huggingface

computer vision

We present Nucleus-Image, a text-to-image generation model that establishes a new Pareto frontier in quality-versus-efficiency by matching or exceeding leading models on GenEval, DPG-Bench, and OneIG-Bench while activating only approximately 2B parameters per forward pass. Nucleus-Image employs a sp...

Keywords: sparse MoE, diffusion transformer, Expert-Choice Routing, decoupled routing, timestep modulation, joint attention, progressive curriculum, Muon optimizer

Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, Sanja Fidler, Jiahui Huang, Huan Ling, Jun Gao, Xuanchi Ren 4/14/2026 arxiv

machine learning

Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative...

Keywords: Lyra 2.0, generative reconstruction, video-to-3D, spatial forgetting, temporal drifting, information routing, self-augmentation, feed-forward reconstruction

Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla 4/14/2026 arxiv

computer vision

Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, a...

Keywords: SceneCritic, SceneOnto, symbolic evaluator, 3D indoor scene synthesis, floor-plan, ontology, LLM, VLM

Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan 4/14/2026 arxiv

computer vision

While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but a...

Keywords: Generative Refinement Networks, Hierarchical Binary Quantization, entropy-guided sampling, autoregressive models, image generation, text-to-image, text-to-video, ImageNet

Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu 4/14/2026 arxiv

machine learning

The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reaso...

Keywords: rDPO, rubric rewards, preference optimization, visual reasoning, on-policy data, reward modeling, vision-language models, DPO

Baris Sarper Tezcan, Hrishikesh Viswanath, Rubab Saher, Daniel Aliaga 4/14/2026 arxiv

computer vision

Urban areas are increasingly vulnerable to thermal extremes driven by rapid urbanization and climate change. Traditionally, thermal extremes have been monitored using Earth-observing satellites and numerical modeling frameworks. For example, land surface temperature derived from Landsat or Sentinel ...

Keywords: conflated inverse modeling, diffusion generative model, forward predictive model, urban vegetation, land surface temperature, inverse problem, climate adaptation

Benzhao Tang, Shiyu Yang 4/14/2026 arxiv

machine learning

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byt...

Keywords: CLAD, log anomaly detection, compressed byte streams, dilated convolution, Transformer, mLSTM, four-way pooling, masked pre-training

Yihang Sun, Huaijin Wang, Patrick Hayden, Jose Blanchet 4/14/2026 arxiv

optimization

The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradient descent, appropriately configured ECD dynamics escape strict local minima and converge to a global minimum, making it appealing for machin...

Keywords: Energy Conserving Descent, sECD, qECD, non‑convex optimization, double‑well, Hamiltonian simulation, quantum algorithms, stochastic dynamics

Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham 4/14/2026 arxiv

machine learning

Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT ente...

Keywords: CT enterography, vision-language, representation geometry, mean pooling, attention pooling, multi-window RGB, multiplanar sampling, retrieval-augmented generation

Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu 4/14/2026 arxiv

machine learning

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains under...

Keywords: GUI grounding, iterative refinement, visual feedback, pixel-precise localization, Computer Use Agents, coding interfaces, GPT-5.4, Claude
Loading...

Preparing your export...