Paper Archive

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

0

9.0/10

George Cazenavette, Antonio Torralba, Vincent Sitzmann 11/20/2025 arxiv

computer vision

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly i...

Keywords: dataset distillation, self-supervised learning, linear gradient matching, linear probes, DINO, CLIP, interpretability, fine-grained classification

View Paper

NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses

0

9.0/10

Jing Wen, Alexander G. Schwing, Shenlong Wang 11/20/2025 arxiv

computer vision

We tackle the task of recovering an animatable 3D human avatar from a single or a sparse set of images. For this task, beyond a set of images, many prior state-of-the-art methods use accurate "ground-truth" camera poses and human poses as input to guide reconstruction at test-time. We show that pose...

Keywords: NoPo-Avatar, 3D avatar reconstruction, pose-free, sparse inputs, robust reconstruction, THuman2.0, XHuman, HuGe100K

View Paper

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

0

9.0/10

Junhao Cheng, Liang Hou, Xin Tao, Jing Liao 11/20/2025 arxiv

computer vision

While language models have become impactful in many real-world applications, video generation remains largely confined to entertainment. Motivated by video's inherent capacity to demonstrate physical-world information that is difficult to convey through language alone (e.g., imagine teaching someone...

Keywords: Video-Next-Event Prediction, VNEP, VANS, Joint-GRPO, Vision-Language Model, Video Diffusion Model, reinforcement learning, VANS-Data-100K

View Paper

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

0

9.0/10

Yang Luo, Xuanlei Zhao, Baijiong Lin, Lingting Zhu, Liyao Tang, Yuqi Liu, Ying-Cong Chen, Shengju Qian, Xin Wang, Yang You 11/20/2025 arxiv

computer vision

Recent progress in generative video models, such as Veo-3, has shown surprising zero-shot reasoning abilities, creating a growing need for systematic and reliable evaluation. We introduce V-ReasonBench, a benchmark designed to assess video reasoning across four key dimensions: structured problem-sol...

Keywords: video_reasoning, V-ReasonBench, benchmark, video_generation, spatial_cognition, physical_dynamics, pattern_inference, chain-of-frames

View Paper

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

0

9.0/10

Zhenyuan Qin, Xincheng Shuai, Henghui Ding 11/20/2025 arxiv

computer vision

Controllable image generation has attracted increasing attention in recent years, enabling users to manipulate visual content such as identity and style. However, achieving simultaneous control over the 9D poses (location, size, and orientation) of multiple objects remains an open challenge. Despite...

Keywords: SceneDesigner, CNOCS map, 9-DoF, ObjectPose9D, Disentangled Object Sampling, controllable image generation, multi-object pose, reinforcement learning

View Paper

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

0

9.0/10

Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han 11/20/2025 arxiv

machine learning

The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: respo...

Keywords: speculative decoding, reinforcement learning, long-tail generation, Adaptive Drafter, CUDAGraph, LLM training, efficiency

View Paper

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

0

9.0/10

Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan, Ruisi Cai, Marcin Chochowski, Ameya Sunil Mahabaleshwarkar, Yoshi Suhara, Oluwatobi Olabiyi, Daniel Korzekwa, Mostofa Patwary, Mohammad Shoeybi, Jan Kautz, Bryan Catanzaro, Ashwath Aithal, Nima Tajbakhsh, Pavlo Molchanov 11/20/2025 arxiv

machine learning

Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this proces...

Keywords: Nemotron Elastic, many-in-one, model compression, Mamba-Attention, SSM elastification, knowledge distillation, nested submodels, router

View Paper

TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing

0

9.0/10

Eddie Pokming Sheung, Qihao Liu, Wufei Ma, Prakhar Kaushik, Jianwen Xie, Alan Yuille 11/20/2025 arxiv

machine learning

With the increasing demand for 3D animation, generating high-fidelity, controllable 4D avatars from textual descriptions remains a significant challenge. Despite notable efforts in 4D generative modeling, existing methods exhibit fundamental limitations that impede their broader applicability, inclu...

Keywords: 4D generation, diffusion models, triplane, triplane re-posing, skeleton-driven, temporal coherence, canonical avatar, auto-regressive

View Paper

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

0

9.0/10

Irmak Guzey, Haozhi Qi, Julen Urain, Changhao Wang, Jessica Yin, Krishna Bodduluri, Mike Lambeta, Lerrel Pinto, Akshara Rai, Jitendra Malik, Tingfan Wu, Akash Sharma, Homanga Bharadhwaj 11/20/2025 arxiv

robotics

Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on lab...

Keywords: AINA, Aria Gen 2, egocentric demos, multi-fingered manipulation, 3D point-based policies, wearable sensors, robot learning from human videos

View Paper

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

0

9.0/10

Priyanka Kargupta, Shuyue Stella Li, Haocheng Wang, Jinu Lee, Shan Chen, Orevaoghene Ahia, Dean Light, Thomas L. Griffiths, Max Kleiman-Weiner, Jiawei Han, Asli Celikyilmaz, Yulia Tsvetkov 11/20/2025 arxiv

machine learning

Large language models solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning computational constraints, me...

Keywords: cognitive taxonomy, meta-cognition, reasoning traces, LLMs, multi-modal, test-time guidance, human vs model comparison, evaluation framework

View Paper

Export Archive Data

Browse by Date

Papers for November 23, 2025

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Cognitive Foundations for Reasoning and Their Manifestation in LLMs