Paper Archive

APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

0

5.0/10

Jiwon Kang, Yeji Choi, JoungBin Lee, Wooseok Jang, Jinhyeok Choi, Taekeun Kang, Yongjae Park, Myungin Kim, Seungryong Kim 1/21/2026 arxiv

computer vision

Face swapping aims to transfer the identity of a source face onto a target face while preserving target-specific attributes such as pose, expression, lighting, skin tone, and makeup. However, since real ground truth for face swapping is unavailable, achieving both accurate identity transfer and high...

View Paper

Towards Understanding Best Practices for Quantization of Vision-Language Models

0

5.0/10

Gautom Das, Vincent La, Ethan Lau, Abhinav Shrivastava, Matthew Gwilliam 1/21/2026 arxiv

computer vision

Large language models (LLMs) deliver impressive results for a variety of tasks, but state-of-the-art systems require fast GPUs with large amounts of memory. To reduce both the memory and latency of these systems, practitioners quantize their learned parameters, typically at half precision. A growing...

Keywords: transformer, gpt, vision transformer

View Paper

Iterative Refinement Improves Compositional Image Generation

0

5.0/10

Shantanu Jaiswal, Mihir Prabhudesai, Nikash Bhardwaj, Zheyang Qin, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak 1/21/2026 arxiv

computer vision

Text-to-image (T2I) models have achieved remarkable progress, yet they continue to struggle with complex prompts that require simultaneously handling multiple objects, relations, and attributes. Existing inference-time strategies, such as parallel sampling with verifiers or simply increasing denoisi...

View Paper

Walk through Paintings: Egocentric World Models from Internet Priors

0

5.0/10

Anurag Bagchi, Zhipeng Bao, Homanga Bharadhwaj, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert 1/21/2026 arxiv

computer vision

What if a video generation model could not only imagine a plausible future, but the correct one, accurately reflecting how the world changes with each action? We address this question by presenting the Egocentric World Model (EgoWM), a simple, architecture-agnostic method that transforms any pretrai...

Keywords: diffusion model, fine-tuning

View Paper

LuxRemix: Lighting Decomposition and Remixing for Indoor Scenes

0

5.0/10

Ruofan Liang, Norman Müller, Ethan Weber, Duncan Zauss, Nandita Vijaykumar, Peter Kontschieder, Christian Richardt 1/21/2026 arxiv

computer vision

We present a novel approach for interactive light editing in indoor scenes from a single multi-view scene capture. Our method leverages a generative image-based light decomposition model that factorizes complex indoor scene illumination into its constituent light sources. This factorization enables ...

View Paper

Rethinking Video Generation Model for the Embodied World

0

5.0/10

Yufan Deng, Zilin Pan, Hongyu Zhang, Xiaojie Li, Ruoqing Hu, Yufei Ding, Yiming Zou, Yan Zeng, Daquan Zhou 1/21/2026 arxiv

computer vision

Video generation models have significantly advanced embodied intelligence, unlocking new possibilities for generating diverse robot data that capture perception, reasoning, and action in the physical world. However, synthesizing high-quality videos that accurately reflect real-world robotic interact...

View Paper

StableWorld: Towards Stable and Consistent Long Interactive Video Generation

0

5.0/10

Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Ziwei Liu, Chenyang Si 1/21/2026 arxiv

natural language processing

In this paper, we explore the overlooked challenge of stability and temporal consistency in interactive video generation, which synthesizes dynamic and controllable video worlds through interactive behaviors such as camera movements and text prompts. Despite remarkable progress in world modeling, cu...

View Paper

MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs

0

5.0/10

Christoph Bartmann, Johannes Schimunek, Mykyta Ielanskyi, Philipp Seidl, Günter Klambauer, Sohvi Luukkonen 1/21/2026 arxiv

natural language processing

A molecule's properties are fundamentally determined by its composition and structure encoded in its molecular graph. Thus, reasoning about molecular properties requires the ability to parse and understand the molecular graph. Large Language Models (LLMs) are increasingly applied to chemistry, tackl...

View Paper

Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks

0

5.0/10

Sahar Tahmasebi, Eric Müller-Budack, Ralph Ewerth 1/21/2026 arxiv

natural language processing

Misinformation and fake news have become a pressing societal challenge, driving the need for reliable automated detection methods. Prior research has highlighted sentiment as an important signal in fake news detection, either by analyzing which sentiments are associated with fake news or by using se...

Keywords: detection, classification

View Paper

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention

0

5.0/10

Yu Wu, Minsik Jeon, Jen-Hao Rick Chang, Oncel Tuzel, Shubham Tulsiani 1/21/2026 arxiv

computer vision

We study positional encodings for multi-view transformers that process tokens from a set of posed input images, and seek a mechanism that encodes patches uniquely, allows SE(3)-invariant attention with multi-frequency similarity, and can be adaptive to the geometry of the underlying scene. We find t...

Keywords: transformer, attention

View Paper

Export Archive Data

Browse by Date

Papers for January 22, 2026

APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

Towards Understanding Best Practices for Quantization of Vision-Language Models

Iterative Refinement Improves Compositional Image Generation

Walk through Paintings: Egocentric World Models from Internet Priors

LuxRemix: Lighting Decomposition and Remixing for Indoor Scenes

Rethinking Video Generation Model for the Embodied World

StableWorld: Towards Stable and Consistent Long Interactive Video Generation

MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs

Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention