Paper Archive

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

0

9.0/10

Yihao Meng, Hao Ouyang, Yue Yu, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Hanlin Wang, Yixuan Li, Cheng Chen, Yanhong Zeng, Yujun Shen, Huamin Qu 10/23/2025 arxiv

machine learning

State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this "narrative gap" with HoloCine, a model that generates entire scenes holistically to ensure global consistency...

Keywords: text-to-video, video_generation, narrative_coherence, Window_Cross-Attention, Sparse_Inter-Shot_Self-Attention, multi-shot, long_video, cinematic

View Paper

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

0

9.0/10

Guocheng Gordon Qian, Ruihang Zhang, Tsai-Shien Chen, Yusuf Dalva, Anujraaj Argo Goyal, Willi Menapace, Ivan Skorokhodov, Meng Dong, Arpit Sahni, Daniil Ostashev, Ju Hu, Sergey Tulyakov, Kuan-Chieh Jackson Wang 10/23/2025 arxiv

generative models

Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present LayerComposer, an interactive framework for personalized, multi-subject text-to-image ge...

Keywords: LayerComposer, layered canvas, locking mechanism, spatial control, personalization, text-to-image, multi-subject, positional embeddings

View Paper

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

0

9.0/10

Nimrod Berman, Omkar Joglekar, Eitan Kosman, Dotan Di Castro, Omri Azencot 10/23/2025 arxiv

machine learning

Recent advances in generative modeling have positioned diffusion models as state-of-the-art tools for sampling from complex data distributions. While these models have shown remarkable success across single-modality domains such as images and audio, extending their capabilities to Modality Translati...

Keywords: latent diffusion, modality translation, denoising diffusion bridge, contrastive alignment, predictive loss, multimodal generation, encoder-decoder, shared latent space

View Paper

VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation

0

9.0/10

Mateo Guaman Castro, Sidharth Rajagopal, Daniel Gorbatov, Matt Schmittle, Rohan Baijal, Octi Zhang, Rosario Scalise, Sidharth Talia, Emma Romig, Celso de Melo, Byron Boots, Abhishek Gupta 10/23/2025 arxiv

robotics

A fundamental challenge in robot navigation lies in learning policies that generalize across diverse environments while conforming to the unique physical constraints and capabilities of a specific embodiment (e.g., quadrupeds can walk up stairs, but rovers cannot). We propose VAMOS, a hierarchical V...

Keywords: vision-language-action, robot_navigation, hierarchical_models, affordance_model, sim-to-real, cross-embodiment, image-space_planning, natural_language_steering

View Paper

GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation

0

9.0/10

Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Zhao Dong, Xueyan Zou, Xiaolong Wang 10/23/2025 arxiv

robotics

This paper presents GSWorld, a robust, photo-realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates "closing the loop" of developing manipulation policies with reproducible evaluation of policies learned from real-robot data an...

Keywords: 3D Gaussian Splatting, GSWorld, GSDF, sim2real, robotic manipulation, photo-realistic simulation, URDF, DAgger

View Paper

SpectraMorph: Structured Latent Learning for Self-Supervised Hyperspectral Super-Resolution

0

9.0/10

Ritik Shah, Marco F Duarte 10/23/2025 arxiv

machine learning

Hyperspectral sensors capture dense spectra per pixel but suffer from low spatial resolution, causing blurred boundaries and mixed-pixel effects. Co-registered companion sensors such as multispectral, RGB, or panchromatic cameras provide high-resolution spatial detail, motivating hyperspectral super...

Keywords: hyperspectral super-resolution, self-supervised learning, spectral unmixing, endmembers, abundance maps, spectral response function, interpretable latent space, multispectral fusion

View Paper

Real Deep Research for AI, Robotics and Beyond

0

9.0/10

Xueyan Zou, Jianglong Ye, Hao Zhang, Xiaoyu Xiang, Mingyu Ding, Zhaojing Yang, Yong Jae Lee, Zhuowen Tu, Sifei Liu, Xiaolong Wang 10/23/2025 arxiv

machine learning

With the rapid growth of research in AI and robotics now producing over 10,000 papers annually it has become increasingly difficult for researchers to stay up to date. Fast evolving trends, the rise of interdisciplinary work, and the need to explore domains beyond one's expertise all contribute to t...

Keywords: Real Deep Research, RDR, research trends, pipeline, systematic analysis, foundation models, robotics, interdisciplinary

View Paper

The Reality Gap in Robotics: Challenges, Solutions, and Best Practices

0

9.0/10

Elie Aljalbout, Jiaxu Xing, Angel Romero, Iretiayo Akinola, Caelan Reed Garrett, Eric Heiden, Abhishek Gupta, Tucker Hermans, Yashraj Narang, Dieter Fox, Davide Scaramuzza, Fabio Ramos 10/23/2025 arxiv

robotics

Machine learning has facilitated significant advancements across various robotics domains, including navigation, locomotion, and manipulation. Many such achievements have been driven by the extensive use of simulation as a critical tool for training and testing robotic systems prior to their deploym...

Keywords: sim-to-real, reality gap, domain randomization, real-to-sim, sim-real co-training, state abstraction, action abstraction, evaluation metrics

View Paper

Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers

0

9.0/10

Dean L Slack, G Thomas Hudson, Thomas Winterbottom, Noura Al Moubayed 10/23/2025 arxiv

computer vision

Inspired by the performance and scalability of autoregressive large language models (LLMs), transformer-based models have seen recent success in the visual domain. This study investigates a transformer adaptation for video prediction with a simple end-to-end approach, comparing various spatiotempora...

Keywords: video prediction, spatiotemporal transformers, pixel-space, autoregressive, physical simulations, PDE, unsupervised, interpretability

View Paper

Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples

0

9.0/10

Shiva Sreeram, Alaa Maalouf, Pratyusha Sharma, Daniela Rus 10/23/2025 arxiv

machine learning

Recently, Sharma et al. suggested a method called Layer-SElective-Rank reduction (LASER) which demonstrated that pruning high-order components of carefully chosen LLM's weight matrices can boost downstream accuracy -- without any gradient-based fine-tuning. Yet LASER's exhaustive, per-matrix search ...

Keywords: LASER, rank-reduction, singular-values, gradient-guidance, LLM-adaptation, model-compression, few-shot, clustering

View Paper

Export Archive Data

Browse by Date

Papers for October 24, 2025

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation

GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation

SpectraMorph: Structured Latent Learning for Self-Supervised Hyperspectral Super-Resolution

Real Deep Research for AI, Robotics and Beyond

The Reality Gap in Robotics: Challenges, Solutions, and Best Practices

Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers

Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples