Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for March 23, 2026

10 papers found

Yu Qi, Xinyi Xu, Ziyu Guo, Siyuan Ma, Renrui Zhang, Xinyan Chen, Ruichuan An, Ruofan Xing, Jiayi Zhang, Haojie Huang, Pheng-Ann Heng, Jonathan Tremblay, Lawson L. S. Wong 3/20/2026 arxiv

computer vision

Video generative models show emerging reasoning behaviors. It is essential to ensure that generated events remain causally consistent across frames for reliable deployment, a property we define as reasoning coherence. To bridge the gap in literature for missing reasoning coherence evaluation, we pro...

Keywords: video reasoning, reasoning coherence, benchmark, Reasoning Score, video generation, text hint, visual hint, MME-CoF-Pro

Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Jing-Hao Xue, Hao Li, Salman Khan, Zhiqiang Shen 3/20/2026 arxiv

computer vision

Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are untouched or only trivially modified, while subtle yet consequential edits outside the mask are treated as natural. We reformulate VLM image tamperin...

Keywords: image forensics, tampering detection, pixel-level localization, benchmark, vision-language models, semantic classification, natural language description, taxonomy

Jiazheng Xing, Fei Du, Hangjie Yuan, Pengwei Liu, Hongbin Xu, Hai Ci, Ruigang Niu, Weihua Chen, Fan Wang, Yong Liu 3/20/2026 arxiv

computer vision

Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods ...

Keywords: personalized_video_generation, diffusion_models, face_attribute_alignment, relational_attention, multimodal_LLM, benchmark, multi_subject_generation

Sebastian Gerard, Josephine Sullivan 3/20/2026 arxiv

computer vision

Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions are equally correct. Current methods typically rely on generative models to capture this uncertainty. However, identifying the underlying modes of the d...

Keywords: deterministic, mode proposals, ambiguous segmentation, segmentation, uncertainty, generative models, flow model, confidence mechanism

Omkar Thawakar, Dmitry Demidov, Vaishnav Potlapalli, Sai Prasanna Teja Reddy Bogireddy, Viswanatha Reddy Gajjala, Alaa Mostafa Lasheen, Rao Muhammad Anwer, Fahad Khan 3/20/2026 arxiv

machine learning

Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cue...

Keywords: Composed Video Retrieval, CoVR, reasoning, zero-shot, multimodal models, causal reasoning, temporal reasoning, benchmark

Sebastian Gerard, Josephine Sullivan 3/20/2026 arxiv

computer vision

Predicting future states in uncertain environments, such as wildfire spread, medical diagnosis, or autonomous driving, requires models that can consider multiple plausible outcomes. While diffusion models can effectively learn such multi-modal distributions, naively sampling from these models is com...

Keywords: diffusion models, segmentation, sample diversity, training-free sampling, particle guidance, SPELL, clustering, MMFire

Yuan Zhou, Yongzhi Li, Yanqi Dai, Xingyu Zhu, Yi Tan, Qingshan Xu, Beier Zhu, Richang Hong, Hanwang Zhang 3/20/2026 arxiv

computer vision

Video-driven human reaction generation aims to synthesize 3D human motions that directly react to observed video sequences, which is crucial for building human-like interactive AI systems. However, existing methods often fail to effectively leverage video inputs to steer human reaction synthesis, re...

Keywords: human reaction generation, 3D human motion, video-driven, observation-reaction mutual steering, Prototype Feedback Steering, Dual-Coupled Reaction Refinement, prototypical vectors, relational distortion

Satoshi Iizuka, Shun Okamoto, Kazuhiro Fukui 3/20/2026 arxiv

computer vision

In this work, we propose Image-to-Image Rectified Flow Reformulation (I2I-RFR), a practical plug-in reformulation that recasts standard I2I regression networks as continuous-time transport models. While pixel-wise I2I regression is simple, stable, and easy to adapt across tasks, it often over-smooth...

Keywords: image-to-image, I2I-RFR, rectified flow, continuous-time, transport model, t-reweighted pixel loss, ODE solver, progressive refinement

Jingyang Lin, Jialian Wu, Jiang Liu, Ximeng Sun, Ze Wang, Xiaodong Yu, Jiebo Luo, Zicheng Liu, Emad Barsoum 3/20/2026 arxiv

machine learning

Video agentic models have advanced challenging video-language tasks. However, most agentic approaches still heavily rely on greedy parsing over densely sampled video frames, resulting in high computational cost. We present VideoSeek, a long-horizon video agent that leverages video logic flow to acti...

Keywords: VideoSeek, video agent, long-horizon, think-act-observe, tool-guided seeking, video logic flow, multi-granular observations, LVBench

Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras 3/20/2026 arxiv

machine learning

Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque mechanisms, limiting auditability in high-stakes domains. We p...

Keywords: Kolmogorov–Arnold, KAN, causal generative models, structural equations, mixed-type tabular data, interpretability, counterfactuals, validation pipeline
Loading...

Preparing your export...