Paper Archive

MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints

0

9.0/10

Yu Qi, Xinyi Xu, Ziyu Guo, Siyuan Ma, Renrui Zhang, Xinyan Chen, Ruichuan An, Ruofan Xing, Jiayi Zhang, Haojie Huang, Pheng-Ann Heng, Jonathan Tremblay, Lawson L. S. Wong 3/20/2026 arxiv

computer vision

Video generative models show emerging reasoning behaviors. It is essential to ensure that generated events remain causally consistent across frames for reliable deployment, a property we define as reasoning coherence. To bridge the gap in literature for missing reasoning coherence evaluation, we pro...

Keywords: video reasoning, reasoning coherence, benchmark, Reasoning Score, video generation, text hint, visual hint, MME-CoF-Pro

View Paper

From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

0

9.0/10

Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Jing-Hao Xue, Hao Li, Salman Khan, Zhiqiang Shen 3/20/2026 arxiv

computer vision

Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are untouched or only trivially modified, while subtle yet consequential edits outside the mask are treated as natural. We reformulate VLM image tamperin...

Keywords: image forensics, tampering detection, pixel-level localization, benchmark, vision-language models, semantic classification, natural language description, taxonomy

View Paper

LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

0

9.0/10

Jiazheng Xing, Fei Du, Hangjie Yuan, Pengwei Liu, Hongbin Xu, Hai Ci, Ruigang Niu, Weihua Chen, Fan Wang, Yong Liu 3/20/2026 arxiv

computer vision

Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods ...

Keywords: personalized_video_generation, diffusion_models, face_attribute_alignment, relational_attention, multimodal_LLM, benchmark, multi_subject_generation

View Paper

Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation

0

9.0/10

Sebastian Gerard, Josephine Sullivan 3/20/2026 arxiv

computer vision

Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions are equally correct. Current methods typically rely on generative models to capture this uncertainty. However, identifying the underlying modes of the d...

Keywords: deterministic, mode proposals, ambiguous segmentation, segmentation, uncertainty, generative models, flow model, confidence mechanism

View Paper

CoVR-R:Reason-Aware Composed Video Retrieval

0

9.0/10

Omkar Thawakar, Dmitry Demidov, Vaishnav Potlapalli, Sai Prasanna Teja Reddy Bogireddy, Viswanatha Reddy Gajjala, Alaa Mostafa Lasheen, Rao Muhammad Anwer, Fahad Khan 3/20/2026 arxiv

machine learning

Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cue...

Keywords: Composed Video Retrieval, CoVR, reasoning, zero-shot, multimodal models, causal reasoning, temporal reasoning, benchmark

View Paper

Wildfire Spread Scenarios: Increasing Sample Diversity of Segmentation Diffusion Models with Training-Free Methods

0

9.0/10

Sebastian Gerard, Josephine Sullivan 3/20/2026 arxiv

computer vision

Predicting future states in uncertain environments, such as wildfire spread, medical diagnosis, or autonomous driving, requires models that can consider multiple plausible outcomes. While diffusion models can effectively learn such multi-modal distributions, naively sampling from these models is com...

Keywords: diffusion models, segmentation, sample diversity, training-free sampling, particle guidance, SPELL, clustering, MMFire

View Paper

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

0

9.0/10

Yuan Zhou, Yongzhi Li, Yanqi Dai, Xingyu Zhu, Yi Tan, Qingshan Xu, Beier Zhu, Richang Hong, Hanwang Zhang 3/20/2026 arxiv

computer vision

Video-driven human reaction generation aims to synthesize 3D human motions that directly react to observed video sequences, which is crucial for building human-like interactive AI systems. However, existing methods often fail to effectively leverage video inputs to steer human reaction synthesis, re...

Keywords: human reaction generation, 3D human motion, video-driven, observation-reaction mutual steering, Prototype Feedback Steering, Dual-Coupled Reaction Refinement, prototypical vectors, relational distortion

View Paper

Improving Image-to-Image Translation via a Rectified Flow Reformulation

0

9.0/10

Satoshi Iizuka, Shun Okamoto, Kazuhiro Fukui 3/20/2026 arxiv

computer vision

In this work, we propose Image-to-Image Rectified Flow Reformulation (I2I-RFR), a practical plug-in reformulation that recasts standard I2I regression networks as continuous-time transport models. While pixel-wise I2I regression is simple, stable, and easy to adapt across tasks, it often over-smooth...

Keywords: image-to-image, I2I-RFR, rectified flow, continuous-time, transport model, t-reweighted pixel loss, ODE solver, progressive refinement

View Paper

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

0

9.0/10

Jingyang Lin, Jialian Wu, Jiang Liu, Ximeng Sun, Ze Wang, Xiaodong Yu, Jiebo Luo, Zicheng Liu, Emad Barsoum 3/20/2026 arxiv

machine learning

Video agentic models have advanced challenging video-language tasks. However, most agentic approaches still heavily rely on greedy parsing over densely sampled video frames, resulting in high computational cost. We present VideoSeek, a long-horizon video agent that leverages video logic flow to acti...

Keywords: VideoSeek, video agent, long-horizon, think-act-observe, tool-guided seeking, video logic flow, multi-granular observations, LVBench

View Paper

Kolmogorov-Arnold causal generative models

0

9.0/10

Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras 3/20/2026 arxiv

machine learning

Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque mechanisms, limiting auditability in high-stakes domains. We p...

Keywords: Kolmogorov–Arnold, KAN, causal generative models, structural equations, mixed-type tabular data, interpretability, counterfactuals, validation pipeline

View Paper

Export Archive Data

Browse by Date

Papers for March 23, 2026

MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints

From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation

CoVR-R:Reason-Aware Composed Video Retrieval

Wildfire Spread Scenarios: Increasing Sample Diversity of Segmentation Diffusion Models with Training-Free Methods

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

Improving Image-to-Image Translation via a Rectified Flow Reformulation

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

Kolmogorov-Arnold causal generative models