Paper Archive

The Universal Weight Subspace Hypothesis

0

9.0/10

Prakhar Kaushik, Shravan Chaudhari, Ankit Vaidya, Rama Chellappa, Alan Yuille 12/4/2025 arxiv

machine learning

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization...

Keywords: weight subspace, spectral analysis, model reusability, LoRA, vision transformer, Mistral-7B, LLaMA-8B, representation learning

View Paper

Light-X: Generative 4D Video Rendering with Camera and Illumination Control

0

9.0/10

Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei Liu 12/4/2025 arxiv

computer vision

Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illuminatio...

Keywords: Light-X, 4D video, video relighting, dynamic point clouds, camera control, Light-Syn, disentanglement, monocular video

View Paper

Value Gradient Guidance for Flow Matching Alignment

0

9.0/10

Zhen Liu, Tim Z. Xiao, Carles Domingo-Enrich, Weiyang Liu, Dinghuai Zhang 12/4/2025 arxiv

generative models

While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to achieve both adaptation efficiency and probabilistically sound prior preservation. In this work, we leverage the theory of optimal control an...

Keywords: VGG-Flow, flow matching, value function, gradient matching, optimal control, model alignment, Stable Diffusion 3, prior preservation

View Paper

Deep infant brain segmentation from multi-contrast MRI

0

9.0/10

Malte Hoffmann, Lilla Zöllei, Adrian V. Dalca 12/4/2025 arxiv

machine learning

Segmentation of magnetic resonance images (MRI) facilitates analysis of human brain development by delineating anatomical structures. However, in infants and young children, accurate segmentation is challenging due to development and imaging constraints. Pediatric brain MRI is notoriously difficult ...

Keywords: infant brain segmentation, MRI, domain randomization, multimodal fusion, BabySeg, dataset shift, medical imaging, deep learning

View Paper

Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

0

9.0/10

Hao-Jen Chien, Yi-Chuan Huang, Chung-Ho Wu, Wei-Lun Chao, Yu-Lun Liu 12/4/2025 arxiv

computer vision

Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on modeling motion, our goal is to create a frozen scene while strategically preserving subtle dynamics to enable us...

Keywords: Splannequin, monocular Mannequin-Challenge, dynamic Gaussian splatting, temporal anchoring, dual-detection, frozen scene, user-selectable frozen-time renderings

View Paper

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

0

9.0/10

Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng Li 12/4/2025 arxiv

machine learning

Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abst...

Keywords: Draft-as-CoT, DraCo-240K, DraCo-CFG, text-to-image, chain-of-thought, preview, rare-concept-generation, interleaved reasoning

View Paper

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

0

9.0/10

Shengyuan Ding, Xinyu Fang, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiangyu Zhao, Haodong Duan, Xiaoyi Dong, Jianze Liang, Bin Wang, Conghui He, Dahua Lin, Jiaqi Wang 12/4/2025 arxiv

machine learning

Reward models are critical for aligning vision-language systems with human preferences, yet current approaches suffer from hallucination, weak visual grounding, and an inability to use tools for verification, limiting their reliability on complex multimodal reasoning tasks. We present ARM-Thinker, a...

Keywords: ARM-Thinker, agentic tool use, reward models, multimodal, visual grounding, ARMBench-VL, reinforcement learning, interpretability

View Paper

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

0

9.0/10

Feng Xu, Guangyao Zhai, Xin Kong, Tingzhong Fu, Daniel F. N. Gordon, Xueli An, Benjamin Busam 12/4/2025 arxiv

robotics

Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimiza...

Keywords: STARE, Stage-Aware Reinforcement, STA-TPO, STA-PPO, IPI pipeline, Vision-Language-Action, trajectory decomposition, stage-aligned rewards

View Paper

NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

0

9.0/10

Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister 12/4/2025 arxiv

computer vision

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consist...

Keywords: Phase-Preserving Diffusion, φ-PD, Frequency-Selective Structured noise, FSS, structure-aligned generation, sim-to-real, diffusion models, image-to-image

View Paper

Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning

0

9.0/10

Purbesh Mitra, Sennur Ulukus 12/4/2025 arxiv

natural language processing

Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done via reinforcement learning with verifiable rewards (RLVR) in reasoning based problems, like math and programm...

Keywords: Semantic Soft Bootstrapping, SSB, self-distillation, long-context reasoning, chain-of-thought, CoT, Qwen2.5-3B-Instruct, GSM8K

View Paper

Export Archive Data

Browse by Date

Papers for December 5, 2025

The Universal Weight Subspace Hypothesis

Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Value Gradient Guidance for Flow Matching Alignment

Deep infant brain segmentation from multi-contrast MRI

Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning