Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for March 27, 2026

10 papers found

Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, Tianfan Xue 3/26/2026 arxiv

machine learning

Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame gener...

Keywords: ShotStream, multi-shot video, streaming generation, causal architecture, distribution matching distillation, dual-cache memory, RoPE discontinuity

Yixing Lao, Xuyang Bai, Xiaoyang Wu, Nuoyuan Yan, Zixin Luo, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Shiwei Li, Hengshuang Zhao 3/26/2026 arxiv

computer vision

Existing feed-forward 3D Gaussian Splatting methods predict pixel-aligned primitives, leading to a quadratic growth in primitive count as resolution increases. This fundamentally limits their scalability, making high-resolution synthesis such as 4K intractable. We introduce LGTM (Less Gaussians, Tex...

Keywords: 3D Gaussian Splatting, LGTM, novel view synthesis, 4K rendering, feed-forward, per-primitive texture, scalability

Bocheng Zou, Mu Cai, Mark Stanley, Dingfu Lu, Yong Jae Lee 3/26/2026 arxiv

computer vision

Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training, inference typically remains restricted to a single, fixed scale...

Keywords: MuRF, Multi-Resolution Fusion, Vision Foundation Models, DINOv2, SigLIP2, multi-scale, inference-time, frozen models

Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang 3/26/2026 arxiv

computer vision

Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additio...

Keywords: RefAlign, reference-to-video, R2V, DiT, VFM, representation alignment, reference alignment loss, OpenS2V-Eval

Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu 3/26/2026 arxiv

machine learning

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personali...

Keywords: Vega, InstructScene, vision-language-action, instruction following, autogressive, diffusion, joint attention, autonomous driving

Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li 3/26/2026 arxiv

machine learning

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize ...

Keywords: Vision-Language-Action, personalization, user embedding, autonomous driving, Bench2Drive, natural language instructions, end-to-end policy, user study

Xincheng Shuai, Song Tang, Yutong Huang, Henghui Ding, Dacheng Tao 3/26/2026 arxiv

computer vision

Graphic design is a creative and innovative process that plays a crucial role in applications such as e-commerce and advertising. However, developing an automated design system that can faithfully translate user intentions into editable design files remains an open challenge. Although recent studies...

Keywords: graphic design, PSD, CreativePSD, tool use, MLLM, text-to-image, automated design, operation traces

Dingxi Zhang, Fangjinhua Wang, Marc Pollefeys, Haofei Xu 3/26/2026 arxiv

computer vision

Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, w...

Keywords: optical flow, zero-shot, large displacement, Vision Transformer, global matching, motion estimation, sub-pixel refinement, long-range point tracking

Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang 3/26/2026 arxiv

machine learning

The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable componen...

Keywords: WriteBack-RAG, retrieval-augmented generation, knowledge base, evidence distillation, write-back, corpus enrichment, retrieval, RAG

Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi 3/26/2026 arxiv

computer vision

Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memoriz...

Keywords: SlotVTG, video temporal grounding, object-centric learning, slot attention, multimodal LLM, OOD generalization, self-supervised vision, adapter
Loading...

Preparing your export...