Paper Archive

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

0

9.0/10

Mengting Chen, Zhengrui Chen, Yongchao Du, Zuan Gao, Taihang Hu, Jinsong Lan, Chao Lin, Yefeng Shen, Xingjian Wang, Zhao Wang, Zhengtao Wu, Xiaoli Xu, Zhengze Xu, Hao Yan, Mingzhou Zhang, Jun Zheng, Qinye Zhou, Xiaoyong Zhu, Bo Zheng 4/21/2026 arxiv

computer vision

Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly effici...

View Paper

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

0

9.0/10

Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue 4/21/2026 arxiv

machine learning

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which rest...

Keywords: sparse-view 3D reconstruction, video diffusion model, global scene memory, geometry-aware conditioning, capture-view retrieval, diffusion distillation, sparse attention

View Paper

CityRAG: Stepping Into a City via Spatially-Grounded Video Generation

0

9.0/10

Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler 4/21/2026 arxiv

computer vision

We address the problem of generating a 3D-consistent, navigable environment that is spatially grounded: a simulation of a real location. Existing video generative models can produce a plausible sequence that is consistent with a text (T2V) or image (I2V) prompt. However, the capability to reconstruc...

Keywords: CityRAG, spatial grounding, video generation, geo-registered data, 3D-consistency, loop closure, temporally unaligned data, simulation

View Paper

Generalization at the Edge of Stability

0

9.0/10

Mario Tuci, Caner Korkmaz, Umut Şimşekli, Tolga Birdal 4/21/2026 arxiv

machine learning

Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly u...

Keywords: edge of stability, sharpness dimension, Lyapunov dimension, stochastic optimizers, fractal attractor, Hessian spectrum, generalization, grokking

View Paper

Generative Drifting for Conditional Medical Image Generation

0

9.0/10

Zirong Li, Siyuan Mei, Weiwen Wu, Andreas Maier, Lina Gölz, Yan Xia 4/21/2026 arxiv

machine learning

Conditional medical image generation plays an important role in many clinically relevant imaging tasks. However, existing methods still face a fundamental challenge in balancing inference efficiency, patient-specific fidelity, and distribution-level plausibility, particularly in high-dimensional 3D ...

Keywords: Generative Drifting, GDM, conditional generation, 3D medical imaging, MRI-to-CT, sparse-view CT, attractive-repulsive drift, feature bank

View Paper

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

0

9.0/10

Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge 4/21/2026 arxiv

robotics

Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer v...

Keywords: unified latent tokens, visual anchoring, human-to-humanoid transfer, policy learning, world modeling, cross-embodiment, egocentric data

View Paper

FASTER: Value-Guided Sampling for Fast RL

0

9.0/10

Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn 4/21/2026 arxiv

reinforcement learning

Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-t...

Keywords: FASTER, diffusion policies, value-guided sampling, denoising MDP, reinforcement learning, robotics, efficient inference, generative RL

View Paper

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

0

9.0/10

Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han 4/21/2026 arxiv

computer vision

Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods often address these factors separately, resulting in limited controllability or reduced visual quality. We revisit this ...

Keywords: human video generation, image-first, temporal refinement, video diffusion, SMPL-X, controllable synthesis, appearance prior, multi-view data

View Paper

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

0

9.0/10

Feihao Fang, My T. Thai, Yuanyuan Lei 4/21/2026 arxiv

machine learning

Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace...

Keywords: large_language_models, logical_reasoning, canonical_correlation_analysis, residual_activations, shared_subspace, natural_language, symbolic_reasoning, training_free

View Paper

PlayCoder: Making LLM-Generated GUI Code Playable

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/21/2026 huggingface

machine learning

Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these ...

Keywords: PlayCoder, PlayEval, PlayTester, Play@k, GUI code generation, code LLMs, interactive apps, benchmark

View Paper

Export Archive Data

Browse by Date

Papers for April 22, 2026

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

CityRAG: Stepping Into a City via Spatially-Grounded Video Generation

Generalization at the Edge of Stability

Generative Drifting for Conditional Medical Image Generation

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

FASTER: Value-Guided Sampling for Fast RL

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

PlayCoder: Making LLM-Generated GUI Code Playable