Paper Archive

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

0

5.0/10

Nhat-Minh Nguyen 5/28/2026 arxiv

computer vision

Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX. We documented...

Keywords: detection

View Paper

GMOS: Grounding Moving Object Segmentation in 3D Space and Time

0

5.0/10

Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman 5/28/2026 arxiv

reinforcement learning

Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two fundamental limitations: they rely on pre-computed 2D auxiliary modalities such as optical flow or point trajectories that lack 3D geometric ...

Keywords: segmentation

View Paper

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

0

5.0/10

Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Hoda Eldardiry, Pinar Yanardag 5/28/2026 arxiv

natural language processing

Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layout by changing which tokens occupy the window or how their positions are encoded. The per-head KV layout itself, a dominant contributor to streaming memory and l...

Keywords: attention

View Paper

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

0

5.0/10

Jusuk Lee, Seungjae Lee, Jonghun Shin, Hoseong Jung, Sungha Kim, Daesol Cho, H. Jin Kim, Jia-Bin Huang, Furong Huang 5/28/2026 arxiv

computer vision

Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trained for static recognition or vision-language alignment, leaving motion understanding to downstream policies. We introdu...

View Paper

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

0

5.0/10

Yaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang, Jiacheng Liu, Xinyue Bi, Zhaoyi Li, Zhiqiang Shen 5/28/2026 arxiv

natural language processing

The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination or provenance difficult. In this work, we formalize $\textbf{...

Keywords: pretraining

View Paper

AdaState: Self-Evolving Anchors for Streaming Video Generation

0

5.0/10

Yusuf Dalva, Pinar Yanardag 5/28/2026 arxiv

machine learning

Autoregressive video diffusion models generate streaming video by producing frames sequentially, conditioning each chunk on previously generated content. These models are structurally anchored to the first frame: its key-value representation occupies a privileged position in the attention cache and ...

Keywords: attention, diffusion model

View Paper

NeuROK: Generative 4D Neural Object Kinematics

0

5.0/10

Chen Geng, Guangzhao He, Yue Gao, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu 5/28/2026 arxiv

computer vision

Data-driven approaches have revolutionized 3D vision, enabling transformers to effectively reconstruct and generate static 3D objects. However, generating simulative 4D dynamics -- realistic temporal deformations of static objects under various physical conditions -- remains challenging and often ad...

Keywords: transformer

View Paper

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

0

5.0/10

You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang 5/28/2026 arxiv

reinforcement learning

As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-real gap. We present Yo...

Keywords: diffusion model

View Paper

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

0

5.0/10

Qinpei Luo, Ruichun Ma, Xinyu Zhang, Lili Qiu 5/28/2026 arxiv

natural language processing

Printed circuit board (PCB) schematic design defines nearly all electronic hardware, but it remains manual and expertise-intensive. While generative AI has advanced digital and analog IC design, PCB schematic generation from natural-language intent is largely unexplored. This paper presents SchGen, ...

Keywords: generative ai

View Paper

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

0

5.0/10

Xiaona Zhou, Muntasir Wahed, Tianjiao Yu, Constantin Brif, Ismini Lourentzou 5/28/2026 arxiv

computer vision

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typicall...

Keywords: fine-tuning, detection

View Paper

Export Archive Data

Browse by Date

Papers for May 31, 2026

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

GMOS: Grounding Moving Object Segmentation in 3D Space and Time

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

AdaState: Self-Evolving Anchors for Streaming Video Generation

NeuROK: Generative 4D Neural Object Kinematics

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection