Paper Archive

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

0

9.0/10

Siang-Ling Zhang, Huai-Hsun Cheng, Tsung-Ju Yang, Yu-Lun Liu 6/18/2026 arxiv

computer vision

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geome...

Keywords: 3D visual illusions, cross-space denoising, view-conditioned texture synthesis, CLIP-guided orientation alignment, Signed Distance Field blending

View Paper

MemoryWAM: Efficient World Action Modeling with Persistent Memory

0

9.0/10

Sizhe Yang, Juncheng Mu, Tianming Wei, Chenhao Lu, Xiaofan Li, Linning Xu, Zhengrong Xue, Zhecheng Yuan, Dahua Lin, Jiangmiao Pang, Huazhe Xu 6/18/2026 arxiv

computer vision

Robust robotic manipulation in the real world requires not only an understanding of the current observation, but also memory and dynamics modeling. World action models (WAMs) possess these capabilities by jointly modeling visual foresight and actions conditioned on both current and historical observ...

Keywords: MemoryWAM, world action modeling, efficient memory, robotic manipulation, attention mechanism

View Paper

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

0

9.0/10

Arkaprava Sinha, Dominick Reilly, Siddharth Krishnan, Hieu Le, Srijan Das 6/18/2026 arxiv

computer vision

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasonin...

Keywords: Long Video Question Answering, Temporal Reasoning, Action-based Candidate Evidence, OpenTSUBench, Activities of Daily Living

View Paper

How Transparent is DiffusionGemma?

0

9.0/10

Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira, Rohin Shah, Neel Nanda 6/18/2026 arxiv

machine learning

LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less t...

Keywords: LLM reasoning, DiffusionGemma, transparency, interpretability, algorithmic transparency, variable transparency, monitorability

View Paper

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

0

9.0/10

Wenhao Chi, Arkaprava Sinha, Dominick Reilly, Hieu Le, Srijan Das 6/18/2026 arxiv

computer vision

Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume complementary knowledge ...

Keywords: egocentric video, multi-teacher distillation, proxy models, action recognition, video retrieval

View Paper

Optimal Deterministic Multicalibration and Omniprediction

0

9.0/10

Georgy Noarov, Aaron Roth 6/18/2026 arxiv

machine learning

A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of t...

Keywords: multicalibration, omniprediction, deterministic predictors, optimal sample complexity, unbiased AI

View Paper

Thinking in Boxes: 3D Editing in Real Images Made Easy

0

9.0/10

Pradhaan S Bhat, Naveen Chandra R, Rishubh Parihar, Vaibhav Vavilala, R. Venkatesh Babu, D. A. Forsyth, Anand Bhattad 6/18/2026 arxiv

computer vision

Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating approximate object locat...

Keywords: 3D editing, real images, spatial transformations, depth-aligned planar floor, generalization

View Paper

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

0

9.0/10

Ruizhong Qiu, Yinglong Xia, Dongqi Fu, Hanqing Zeng, Ren Chen, Xiangjun Fan, Hong Li, Hong Yan, Hanghang Tong 6/18/2026 arxiv

machine learning

Generative recommendation is an emerging paradigm that has shown promise in industrial recommendation systems, aiming to predict users' next interactions from their historical behaviors. At the core of generative recommendation lies item tokenization, which bridges item semantics and recommendation ...

Keywords: Generative Recommendation, User Interest Context, Graph Neural Networks, Semantic Tokenization, Scalability

View Paper

Generating Robot Hands from Human Demonstrations

0

9.0/10

Sha Yi, Nicklas Hansen, Xueqian Bai, Carmelo Sferrazza, Michael T. Tolley, Xiaolong Wang 6/18/2026 arxiv

robotics

Robot learning has advanced rapidly in learning control, but learning the physical body of a robot remains much more difficult because jointly searching over design and control creates a very large combinatorial problem. Here, we present a data-driven framework for generating robot hands from human ...

Keywords: robot hand design, human demonstration learning, inverse kinematics, reinforcement learning, robot control

View Paper

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

0

9.0/10

Przemyslaw Musialski 6/18/2026 arxiv

computer vision

We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie group elements: the...

Keywords: Lie-Algebra Attention, matrix Lie groups, computer vision, attention mechanisms, SE(2), SO(3), Aff(2)

View Paper

Export Archive Data

Browse by Date

Papers for June 19, 2026

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

MemoryWAM: Efficient World Action Modeling with Persistent Memory

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

How Transparent is DiffusionGemma?

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Optimal Deterministic Multicalibration and Omniprediction

Thinking in Boxes: 3D Editing in Real Images Made Easy

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

Generating Robot Hands from Human Demonstrations

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups