Paper Archive

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

0

9.0/10

Ci-Siang Lin, Min-Hung Chen, I-Jieh Liu, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang 10/8/2025 arxiv

machine learning

Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence in the video. Most existing methods require end-to-end training with dense mask annotations, which could be computation-consuming and less scalable. In this work, we rethink the RVOS problem and a...

Keywords: Referring Video Object Segmentation, Temporal Prompting, Prompt Preference Learning, foundation segmentation models, object detectors, trackers, video-language

View Paper

Artificial Hippocampus Networks for Efficient Long-Context Modeling

0

9.0/10

Yunhao Fang, Weihao Yu, Shu Zhong, Qinghao Ye, Xuehan Xiong, Lai Wei 10/8/2025 arxiv

machine learning

Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of arti...

Keywords: Artificial Hippocampus Network, AHN, long-context modeling, sliding-window KV cache, fixed-size long-term memory, Mamba2, DeltaNet, Gated DeltaNet

View Paper

Quantum-enhanced Computer Vision: Going Beyond Classical Algorithms

0

9.0/10

Natacha Kuete Meli, Shuteng Wang, Marcel Seelbach Benkner, Michele Sasdelli, Tat-Jun Chin, Tolga Birdal, Michael Moeller, Vladislav Golyanik 10/8/2025 arxiv

machine learning

Quantum-enhanced Computer Vision (QeCV) is a new research field at the intersection of computer vision, optimisation theory, machine learning and quantum computing. It has high potential to transform how visual signals are processed and interpreted with the help of quantum computing that leverages q...

Keywords: Quantum-enhanced Computer Vision, QeCV, quantum computing, quantum annealing, gate-based quantum computing, parametrised quantum circuits, survey, computer vision

View Paper

Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

0

9.0/10

Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, Xin Yang 10/8/2025 arxiv

computer vision

This paper presents Pixel-Perfect Depth, a monocular depth estimation model based on pixel-space diffusion generation that produces high-quality, flying-pixel-free point clouds from estimated depth maps. Current generative depth estimation models fine-tune Stable Diffusion and achieve impressive per...

Keywords: monocular depth estimation, pixel-space diffusion, Semantics-Prompted Diffusion Transformers, SP-DiT, Cascade DiT, flying pixels, edge-aware point cloud, vision foundation models

View Paper

Vibe Checker: Aligning Code Evaluation with Human Preference

0

9.0/10

Ming Zhong, Xiang Zhou, Ting-Yun Chang, Qingze Wang, Nan Xu, Xiance Si, Dan Garrette, Shyam Upadhyay, Jeremiah Liu, Jiawei Han, Benoit Schillings, Jiao Sun 10/8/2025 arxiv

machine learning

Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check is tied to real-world human preference and goes beyond functionality: the solution should feel...

Keywords: Vibe Checker, VeriCode, instruction following, code evaluation, deterministic verifiers, LLMs, functional correctness, human preference

View Paper

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

0

9.0/10

Fabian Paischer, Gianluca Galletti, William Hornsby, Paul Setinek, Lorenzo Zanisi, Naomi Carey, Stanislas Pamela, Johannes Brandstetter 10/8/2025 arxiv

machine learning

Nuclear fusion plays a pivotal role in the quest for reliable and sustainable energy production. A major roadblock to viable fusion power is understanding plasma turbulence, which significantly impairs plasma confinement, and is vital for next-generation reactor design. Plasma turbulence is governed...

Keywords: gyrokinetics, plasma turbulence, surrogate modeling, vision transformer, 5D simulations, GyroSwin, heat flux, deep learning

View Paper

WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

0

9.0/10

Zezhong Qian, Xiaowei Chi, Yuming Li, Shizun Wang, Zhiyuan Qin, Xiaozhu Ju, Sirui Han, Shanghang Zhang 10/8/2025 arxiv

robotics

Wrist-view observations are crucial for VLA models as they capture fine-grained hand-object interactions that directly enhance manipulation performance. Yet large-scale datasets rarely include such recordings, resulting in a substantial gap between abundant anchor views and scarce wrist views. Exist...

Keywords: WristWorld, 4D world model, wrist-view, VGGT, Spatial Projection Consistency, SPC Loss, video generation, robotic manipulation

View Paper

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

0

9.0/10

Sumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai, Philip Torr, Riashat Islam, Shital Shah, Christian Schroeder de Witt, Charles London 10/8/2025 arxiv

machine learning

Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase. Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision, neither of which scales easily. In this work, we introduce a scalable met...

Keywords: curriculum RL, long-horizon reasoning, outcome-only rewards, synthetic composition, GSM8K, GSM-Symbolic, MATH-500, AIME

View Paper

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

0

9.0/10

Siyoon Jin, Seongchan Kim, Dahyun Chung, Jaeho Lee, Hyunwook Choi, Jisu Nam, Jiyoung Kim, Seungryong Kim 10/8/2025 arxiv

computer vision

Video DiTs have advanced video generation, yet they still struggle to model multi-instance or subject-object interactions. This raises a key question: How do these models internally represent interactions? To answer this, we curate MATRIX-11K, a video dataset with interaction-aware captions and mult...

Keywords: MATRIX-11K, MATRIX, video DiT, mask tracks, semantic grounding, semantic propagation, interaction-aware generation, InterGenEval

View Paper

Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

0

9.0/10

Yue Li, Ran Tao, Derek Hommel, Yusuf Denizay Dönder, Sungyong Chang, David Mimno, Unso Eun Seo Jo 10/8/2025 arxiv

machine learning

In the business domain, where data-driven decision making is crucial, text-to-SQL is fundamental for easy natural language access to structured data. While recent LLMs have achieved strong performance in code generation, existing text-to-SQL benchmarks remain focused on factual retrieval of past rec...

Keywords: text-to-SQL, benchmark, business intelligence, LLMs, causal reasoning, temporal forecasting, recommendation, CORGI

View Paper

Export Archive Data

Browse by Date

Papers for October 9, 2025

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Quantum-enhanced Computer Vision: Going Beyond Classical Algorithms

Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

Vibe Checker: Aligning Code Evaluation with Human Preference

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain