Paper Archive

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

0

9.0/10

George Cazenavette, Antonio Torralba, Vincent Sitzmann 11/20/2025 arxiv

machine learning

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly i...

Keywords: dataset distillation, self-supervised learning, linear probe, gradient matching, DINO, CLIP, model interpretability, fine-grained classification

View Paper

NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses

0

9.0/10

Jing Wen, Alexander G. Schwing, Shenlong Wang 11/20/2025 arxiv

computer vision

We tackle the task of recovering an animatable 3D human avatar from a single or a sparse set of images. For this task, beyond a set of images, many prior state-of-the-art methods use accurate "ground-truth" camera poses and human poses as input to guide reconstruction at test-time. We show that pose...

Keywords: 3D_avatar, animatable_avatar, pose_free, single_view_reconstruction, sparse_inputs, THuman2.0, XHuman, HuGe100K

View Paper

Learning to Think Fast and Slow for Visual Language Models

0

9.0/10

Chenyu Lin, Cheng Chi, Jinlin Wu, Sharon Li, Kaiyang Zhou 11/20/2025 arxiv

machine learning

When confronted with complex problems, we tend to think slowly; conversely, for simple questions, we think quickly. Such a two-system thinking mechanism allows us to efficiently allocate cognitive resources, enabling quick decision-making for straightforward issues while reserving deeper analytical ...

Keywords: DualMindVLM, fast_and_slow_thinking, visual_language_models, GRPO, reinforcement_learning, chain_of_thought, token_efficiency, adaptive_reasoning

View Paper

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

0

9.0/10

Junhao Cheng, Liang Hou, Xin Tao, Jing Liao 11/20/2025 arxiv

computer vision

While language models have become impactful in many real-world applications, video generation remains largely confined to entertainment. Motivated by video's inherent capacity to demonstrate physical-world information that is difficult to convey through language alone (e.g., imagine teaching someone...

Keywords: VNEP, VANS, Joint-GRPO, Vision-Language Model, Video Diffusion Model, VANS-Data-100K, video generation, reinforcement learning

View Paper

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

0

9.0/10

Yang Luo, Xuanlei Zhao, Baijiong Lin, Lingting Zhu, Liyao Tang, Yuqi Liu, Ying-Cong Chen, Shengju Qian, Xin Wang, Yang You 11/20/2025 arxiv

computer vision

Recent progress in generative video models, such as Veo-3, has shown surprising zero-shot reasoning abilities, creating a growing need for systematic and reliable evaluation. We introduce V-ReasonBench, a benchmark designed to assess video reasoning across four key dimensions: structured problem-sol...

Keywords: V-ReasonBench, video_reasoning, video_generation, benchmark, structured_reasoning, spatial_cognition, pattern_inference, physical_dynamics

View Paper

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

0

9.0/10

Zhenyuan Qin, Xincheng Shuai, Henghui Ding 11/20/2025 arxiv

computer vision

Controllable image generation has attracted increasing attention in recent years, enabling users to manipulate visual content such as identity and style. However, achieving simultaneous control over the 9D poses (location, size, and orientation) of multiple objects remains an open challenge. Despite...

Keywords: 9-DoF, CNOCS, ObjectPose9D, Disentangled Object Sampling, controllable image generation, pose manipulation, reinforcement learning, multi-object synthesis

View Paper

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

0

9.0/10

Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han 11/20/2025 arxiv

machine learning

The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: respo...

Keywords: TLT, long-tail, speculative decoding, Adaptive Drafter, Adaptive Rollout Engine, CUDAGraphs, reinforcement learning, LLMs

View Paper

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

0

9.0/10

Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan, Ruisi Cai, Marcin Chochowski, Ameya Sunil Mahabaleshwarkar, Yoshi Suhara, Oluwatobi Olabiyi, Daniel Korzekwa, Mostofa Patwary, Mohammad Shoeybi, Jan Kautz, Bryan Catanzaro, Ashwath Aithal, Nima Tajbakhsh, Pavlo Molchanov 11/20/2025 arxiv

machine learning

Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this proces...

Keywords: Nemotron Elastic, nested models, many-in-one, Mamba-Attention, SSM elastification, heterogeneous MLP elastification, normalized MSE layer importance, router

View Paper

TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing

0

9.0/10

Eddie Pokming Sheung, Qihao Liu, Wufei Ma, Prakhar Kaushik, Jianwen Xie, Alan Yuille 11/20/2025 arxiv

computer vision

With the increasing demand for 3D animation, generating high-fidelity, controllable 4D avatars from textual descriptions remains a significant challenge. Despite notable efforts in 4D generative modeling, existing methods exhibit fundamental limitations that impede their broader applicability, inclu...

Keywords: TriDiff-4D, triplane, diffusion, 4D generation, skeleton-driven, auto-regressive, motion priors, 3D geometry

View Paper

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

0

9.0/10

Irmak Guzey, Haozhi Qi, Julen Urain, Changhao Wang, Jessica Yin, Krishna Bodduluri, Mike Lambeta, Lerrel Pinto, Akshara Rai, Jitendra Malik, Tingfan Wu, Akash Sharma, Homanga Bharadhwaj 11/20/2025 arxiv

robotics

Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on lab...

Keywords: multi-fingered manipulation, human demonstrations, egocentric vision, Aria Gen 2, AINA, robot learning, embodiment gap, point-based policies

View Paper

Export Archive Data

Browse by Date

Papers for November 22, 2025

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses

Learning to Think Fast and Slow for Visual Language Models

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations