Browse and export your curated research paper collection
[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
machine learningMulti-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame gener...
[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
computer visionExisting feed-forward 3D Gaussian Splatting methods predict pixel-aligned primitives, leading to a quadratic growth in primitive count as resolution increases. This fundamentally limits their scalability, making high-resolution synthesis such as 4K intractable. We introduce LGTM (Less Gaussians, Tex...
[object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
computer visionVision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training, inference typically remains restricted to a single, fixed scale...
[object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
computer visionAccurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, w...
[object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
machine learningMultimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memoriz...
[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
computer visionAutoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that eff...
[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
computer visionFine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability,...
[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
computer visionVideo world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing su...
[object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
machine learningBlock-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is oft...
[object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface
roboticsAction-conditioned robot world models generate future video frames of the manipulated scene given a robot action sequence, offering a promising alternative for simulating tasks that are difficult to model with traditional physics engines. However, these models are optimized for short-term prediction...
Preparing your export...