Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for March 28, 2026

10 papers found

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

machine learning

Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame gener...

Keywords: ShotStream, multi-shot video generation, causal architecture, Distribution Matching Distillation, dual-cache memory, RoPE discontinuity, interactive storytelling, text-to-video

[object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

computer vision

Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training, inference typically remains restricted to a single, fixed scale...

Keywords: MuRF, Multi-Resolution Fusion, Vision Foundation Models, DINOv2, SigLIP2, inference-time fusion, multi-scale, frozen models

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

machine learning

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personali...

Keywords: Vega, InstructScene, vision-language-action, diffusion models, autoregressive, instruction-following, autonomous driving, multimodal learning

[object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

computer vision

Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, w...

Keywords: optical flow, zero-shot, Vision Transformer, global matching, large displacement, motion estimation, point tracking, transfer learning

[object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

computer vision

Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memoriz...

Keywords: SlotVTG, slot attention, object-centric, video temporal grounding, MLLM, adapter, OOD generalization, self-supervised vision

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

machine learning

Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that eff...

Keywords: PackForcing, video diffusion, KV-cache, temporal compression, RoPE adjustment, VBench, long-video synthesis, autoregressive

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

computer vision

Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability,...

Keywords: facial expression editing, diffusion models, contrastive learning, continuous annotations, dataset, disentanglement, identity preservation, textual latent interpolation

[object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

machine learning

We demonstrate an all solid state semiconductor device, based on epitaxial single crystalline metal halide perovskites, enabling reversible control of a perovskite photoluminescence with a gate voltage. Fundamentally distinct from electroluminescent diodes, such a photoluminescence field effect tran...

Keywords: perovskite, photoluminescence, field‑effect transistor, electrostatic gating, nonradiative recombination, epitaxial single crystal, optoelectronics

[object Object], [object Object] 3/26/2026 huggingface

machine learning

We introduce iterated beta integrals, a new class of iterated integrals on the universal abelian covering of the punctured projective line that unifies hyperlogarithms and classical beta integrals while preserving their fundamental properties. We establish various analytic properties of these integr...

Keywords: iterated integrals, beta integrals, hyperlogarithms, multiple zeta values, Galois descent, abelian covering, punctured projective line, periods

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/26/2026 huggingface

computer vision

Video world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing su...

Keywords: Hybrid Memory, HM-World, HyDRA, video world models, spatiotemporal retrieval, memory compression, dynamic subjects, occlusion
Loading...

Preparing your export...