Paper Archive

Gen-Searcher: Reinforcing Agentic Search for Image Generation

0

9.0/10

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue 3/30/2026 arxiv

computer vision

Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In...

Keywords: search-augmented, image generation, multi-hop reasoning, reinforcement learning, SFT, GRPO, KnowGen, datasets

View Paper

HandX: Scaling Bimanual Motion and Interaction Generation

0

9.0/10

Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui 3/30/2026 arxiv

computer vision

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack hig...

Keywords: bimanual, hand motion, motion capture, LLM annotation, diffusion models, autoregressive models, dataset, hand-focused metrics

View Paper

Adaptive Block-Scaled Data Types

0

9.0/10

Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han 3/30/2026 arxiv

machine learning

NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from i...

Keywords: IF4, NVFP4, 4-bit quantization, block-scaled, E4M3, IF3, IF6, quantized training

View Paper

Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds

0

9.0/10

N Alex Cayco Gajic, Arthur Pellegrino 3/30/2026 arxiv

machine learning

Similarity measures are widely used to interpret the representational geometries used by neural networks to solve tasks. Yet, because existing methods compare the extrinsic geometry of representations in state space, rather than their intrinsic geometry, they may fail to capture subtle yet crucial d...

Keywords: Metric Similarity Analysis, MSA, Riemannian geometry, statistical manifolds, intrinsic geometry, neural representations, manifold hypothesis, diffusion models

View Paper

PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

0

9.0/10

Lorenza Prospero, Orest Kupyn, Ostap Viniavskyi, João F. Henriques, Christian Rupprecht 3/30/2026 arxiv

computer vision

Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engine...

Keywords: diffusion models, 3D human mesh, synthetic data, Direct Preference Optimization, data generation, curriculum learning, quality filtering

View Paper

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

0

9.0/10

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or 3/30/2026 arxiv

computer vision

Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a w...

Keywords: Diffusion Transformers, Contextual Space, on-the-fly repulsion, multimodal attention, text-to-image, diversity, Turbo models, distilled models

View Paper

SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

0

9.0/10

Patrick Rim, Kevin Harris, Braden Copple, Shangchen Han, Xu Xie, Ivan Shugurov, Sizhe An, He Wen, Alex Wong, Tomas Hodan, Kun He 3/30/2026 arxiv

computer vision

Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of mo...

Keywords: egocentric vision, hand-object interaction, 3D annotations, multi-camera rig, dataset, marker-less capture, ego-exo tracking, SHOW3D

View Paper

FlowIt: Global Matching for Optical Flow with Confidence-Guided Refinement

0

9.0/10

Sadra Safadoust, Fabio Tosi, Matteo Poggi, Fatma Güney 3/30/2026 arxiv

computer vision

We present FlowIt, a novel architecture for optical flow estimation designed to robustly handle large pixel displacements. At its core, FlowIt leverages a hierarchical transformer architecture that captures extensive global context, enabling the model to effectively model long-range correspondences....

Keywords: optical_flow, transformer, hierarchical_transformer, optimal_transport, confidence_map, occlusion_handling, guided_refinement, cross-dataset_generalization

View Paper

SonoWorld: From One Image to a 3D Audio-Visual Scene

0

9.0/10

Derong Jin, Xiyi Chen, Ming C. Lin, Ruohan Gao 3/30/2026 arxiv

machine learning

Tremendous progress in visual scene generation now turns a single image into an explorable 3D world, yet immersion remains incomplete without sound. We introduce Image2AVScene, the task of generating a 3D audio-visual scene from a single image, and present SonoWorld, the first framework to tackle th...

Keywords: image-to-audio-visual, ambisonics, panorama outpainting, 3D reconstruction, spatial audio, sound anchors, audio-visual learning, one-shot acoustic learning

View Paper

Temporal Credit Is Free

0

9.0/10

Aur Shalev Merin 3/30/2026 arxiv

machine learning

Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural ru...

Keywords: RTRL, RMSprop, online learning, recurrent networks, temporal credit assignment, normalization, hidden state, scalability

View Paper

Export Archive Data

Browse by Date

Papers for March 31, 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

HandX: Scaling Bimanual Motion and Interaction Generation

Adaptive Block-Scaled Data Types

Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds

PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

FlowIt: Global Matching for Optical Flow with Confidence-Guided Refinement

SonoWorld: From One Image to a 3D Audio-Visual Scene

Temporal Credit Is Free