Paper Archive

VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models

0

9.0/10

Dominick Reilly, Manish Kumar Govind, Le Xue, Srijan Das 10/15/2025 arxiv

machine learning

Large Vision-Language Models (VLMs) excel at general visual reasoning tasks but exhibit sharp performance degradation when applied to novel domains with substantial distribution shifts from pretraining data. Existing domain adaptation approaches finetune different VLM components, but this often resu...

Keywords: Vision-Language Models, Domain Adaptation, Visual Probes, VisCoP, Cross-modal Transfer, Egocentric Vision

View Paper

Generative Universal Verifier as Multimodal Meta-Reasoner

0

9.0/10

Xinchen Zhang, Xiaoying Zhang, Youbin Wu, Yanbin Cao, Renrui Zhang, Ruihang Chu, Ling Yang, Yujiu Yang 10/15/2025 arxiv

machine learning

We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation p...

Keywords: Generative Universal Verifier, OmniVerifier-7B, OmniVerifier-TTS, ViVerBench, visual verification, multimodal reasoning, test-time scaling, vision-language models

View Paper

Trace Anything: Representing Any Video in 4D via Trajectory Fields

0

9.0/10

Xinhang Liu, Yuxi Xiao, Donny Y. Chen, Jiashi Feng, Yu-Wing Tai, Chi-Keung Tang, Bingyi Kang 10/15/2025 arxiv

computer vision

Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing an...

Keywords: Trajectory Field, Trace Anything, video dynamics, 4D, B-spline, spatio-temporal representation, point tracking, neural field

View Paper

BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning

0

9.0/10

Jia-Chen Gu, Junyi Zhang, Di Wu, Yuankai Li, Kai-Wei Chang, Nanyun Peng 10/15/2025 arxiv

machine learning

As retrieval-augmented generation (RAG) tackles complex tasks, increasingly expanded contexts offer richer information, but at the cost of higher latency and increased cognitive load on the model. To mitigate this bottleneck, especially for intricate multi-hop questions, we introduce BRIEF-Pro. It i...

Keywords: BRIEF-Pro, context compression, short-to-long synthesis, RAG, multi-hop QA, abstractive summarization, in-context learning, efficiency

View Paper

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

0

9.0/10

Giovanni Monea, Yair Feldman, Shankar Padmanabhan, Kianté Brantley, Yoav Artzi 10/15/2025 arxiv

machine learning

The scalability of large language models for long-context reasoning is severely constrained by the linear growth of their Transformer key-value cache, which incurs significant memory and computational costs. We posit that as a model generates reasoning tokens, the informational value of past generat...

Keywords: KV cache, cache compression, compression beacons, breadcrumbs reasoning, transformer, reinforcement learning, distillation, memory-efficiency

View Paper

The Mechanistic Emergence of Symbol Grounding in Language Models

0

9.0/10

Shuyu Wu, Ziqiao Ma, Xiaoxi Luo, Yidong Huang, Josue Torres-Fonseca, Freda Shi, Joyce Chai 10/15/2025 arxiv

machine learning

Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectiv...

Keywords: symbol grounding, mechanistic interpretability, multimodal models, attention heads, aggregate mechanism, Transformers, state-space models, LSTM

View Paper

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

0

9.0/10

Yi Zhang, Bolin Ni, Xin-Sheng Chen, Heng-Rui Zhang, Yongming Rao, Houwen Peng, Qinglin Lu, Han Hu, Meng-Hao Guo, Shi-Min Hu 10/15/2025 arxiv

machine learning

Fully open multimodal large language models (MLLMs) currently lag behind proprietary counterparts, primarily due to a significant gap in data quality for supervised fine-tuning (SFT). Existing open-source datasets are often plagued by widespread noise and a critical deficit in complex reasoning data...

Keywords: Honey-Data-15M, HoneyPipe, DataStudio, Bee-8B, multimodal-LLM, Chain-of-Thought, data curation, supervised fine-tuning

View Paper

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

0

9.0/10

Nir Goren, Oren Katzir, Abhinav Nakarmi, Eyal Ronen, Mahmood Sharif, Or Patashnik 10/15/2025 arxiv

computer vision

With the rapid adoption of diffusion models for visual content generation, proving authorship and protecting copyright have become critical. This challenge is particularly important when model owners keep their models private and may be unwilling or unable to handle authorship issues, making third-p...

Keywords: NoisePrints, watermarking, diffusion models, seed-based verification, zero-knowledge proofs, authorship verification, model provenance, cryptographic hashing

View Paper

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

0

9.0/10

Ziqing Lu, Lifeng Lai, Weiyu Xu 10/15/2025 arxiv

reinforcement learning

Reinforcement learning (RL) for the Markov Decision Process (MDP) has emerged in many security-related applications, such as autonomous driving, financial decisions, and drone/robot algorithms. In order to improve the robustness/defense of RL systems against adversaries, studying various adversarial...

Keywords: rate-distortion, information-theoretic, adversarial attacks, reinforcement learning, MDP, reward regret, model-based, model-free

View Paper

T3former: Temporal Graph Classification with Topological Machine Learning

0

9.0/10

Md. Joshem Uddin, Soham Changani, Baris Coskunuzer 10/15/2025 arxiv

machine learning

Temporal graph classification plays a critical role in applications such as cybersecurity, brain connectivity analysis, social dynamics, and traffic monitoring. Despite its significance, this problem remains underexplored compared to temporal link prediction or node forecasting. Existing methods oft...

Keywords: T3former, temporal graphs, temporal graph classification, topological descriptors, spectral descriptors, descriptor-attention, stability guarantees, dynamic social networks

View Paper

Export Archive Data

Browse by Date

Papers for October 16, 2025

VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models

Generative Universal Verifier as Multimodal Meta-Reasoner

Trace Anything: Representing Any Video in 4D via Trajectory Fields

BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

The Mechanistic Emergence of Symbol Grounding in Language Models

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

T3former: Temporal Graph Classification with Topological Machine Learning