Paper Archive

ReSplat: Learning Recurrent Gaussian Splats

0

9.0/10

Haofei Xu, Daniel Barath, Andreas Geiger, Marc Pollefeys 10/9/2025 arxiv

computer vision

While feed-forward Gaussian splatting models provide computational efficiency and effectively handle sparse input settings, their performance is fundamentally limited by the reliance on a single forward pass during inference. We propose ReSplat, a feed-forward recurrent Gaussian splatting model that...

Keywords: Gaussian splatting, recurrent network, view synthesis, 3D reconstruction, rendering error feedback, realestate10k, DL3DV

View Paper

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

0

9.0/10

Rocktim Jyoti Das, Harsh Singh, Diana Turmakhan, Muhammad Abdullah Sohail, Mingfei Han, Preslav Nakov, Fabio Pizzati, Ivan Laptev 10/9/2025 arxiv

robotics

Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and...

Keywords: BLAZER, LLM planners, zero-shot data generation, robotic manipulation, simulation, sim-to-real transfer, fine-tuning, data scaling

View Paper

Who Said Neural Networks Aren't Linear?

0

9.0/10

Nimrod Berman, Assaf Hallak, Assaf Shocher 10/9/2025 arxiv

machine learning

Neural networks are famously nonlinear. However, linearity is defined relative to a pair of vector spaces, $f$$:$$X$$\to$$Y$. Is it possible to identify a pair of non-standard vector spaces for which a conventionally nonlinear function is, in fact, linear? This paper introduces a method that makes s...

Keywords: Linearizer, invertible neural networks, linear operator, vector spaces, idempotency, projective generative model, diffusion models, SVD

View Paper

Scalable Offline Metrics for Autonomous Driving

0

9.0/10

Animikh Aich, Adwait Kulkarni, Eshed Ohn-Bar 10/9/2025 arxiv

machine learning

Real-World evaluation of perception-based planning models for robotic systems, such as autonomous vehicles, can be safely and inexpensively conducted offline, i.e., by computing model prediction error over a pre-collected validation dataset with ground-truth annotations. However, extrapolating from ...

Keywords: offline metrics, epistemic uncertainty, autonomous driving, closed-loop evaluation, simulation, real-world validation, perception-based planning, offline-to-online correlation

View Paper

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

0

9.0/10

Hongyu Li, Lingfeng Sun, Yafei Hu, Duy Ta, Jennifer Barry, George Konidaris, Jiahui Fu 10/9/2025 arxiv

robotics

Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that conv...

Keywords: zero-shot, video generation, actionable flow, object flow, robot manipulation, deformable objects, trajectory optimization, particle-based dynamics

View Paper

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation

0

9.0/10

Qin Liu, Jacob Dineen, Yuxi Huang, Sheng Zhang, Hoifung Poon, Ben Zhou, Muhao Chen 10/9/2025 arxiv

machine learning

Benchmarks are central to measuring the capabilities of large language models and guiding model development, yet widespread data leakage from pretraining corpora undermines their validity. Models can match memorized content rather than demonstrate true generalization, which inflates scores, distorts...

Keywords: ArenaBencher, benchmark evolution, benchmarking, data leakage, LLM judge, multi-model evaluation, in-context learning, test-case generation

View Paper

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

0

9.0/10

Tajamul Ashraf, Umair Nawaz, Abdelrahman M. Shaker, Rao Anwer, Philip Torr, Fahad Shahbaz Khan, Salman Khan 10/9/2025 arxiv

multimodal learning

Vision language models (VLMs) are increasingly deployed as controllers with access to external tools for complex reasoning and decision-making, yet their effectiveness remains limited by the scarcity of high-quality multimodal trajectories and the cost of manual annotation. We address this challenge...

Keywords: vision-language models, multimodal trajectories, agent tuning, M-TRACE, Pref-X, preference learning, tool use, MATRIX Agent

View Paper

D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction

0

9.0/10

Meixi Song, Xin Lin, Dizhe Zhang, Haodong Li, Xiangtai Li, Bo Du, Lu Qi 10/9/2025 arxiv

computer vision

Recent advances in 3D Gaussian Splatting (3DGS) enable real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations. However, performance degradation and instability remain significant under sparse-view conditions. In this work, we identify two key failure modes under sparse-...

Keywords: 3D Gaussian Splatting, sparse-view reconstruction, novel view synthesis, depth-guided dropout, distance-aware supervision, stability metric, 3D reconstruction, rendering robustness

View Paper

How to Teach Large Multimodal Models New Skills

0

9.0/10

Zhen Zhu, Yiming Gong, Yao Xiao, Yaoyao Liu, Derek Hoiem 10/9/2025 arxiv

machine learning

How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after n...

Keywords: large multimodal models, fine-tuning, catastrophic forgetting, self-attention projection, MLP Gate, token distribution, counting-bias probe, continual learning

View Paper

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

0

9.0/10

Changyao Tian, Hao Li, Gen Luo, Xizhou Zhu, Weijie Su, Hanming Deng, Jinguo Zhu, Jie Shao, Ziran Zhu, Yunpeng Liu, Lewei Lu, Wenhai Wang, Hongsheng Li, Jifeng Dai 10/9/2025 arxiv

machine learning

Compositional training has been the de-facto paradigm in existing Multimodal Large Language Models (MLLMs), where pre-trained vision encoders are connected with pre-trained LLMs through continuous multimodal pre-training. However, the multimodal scaling property of this paradigm remains difficult to...

Keywords: NaViL, native MLLM, end-to-end training, scaling laws, data-constrained, visual encoder, LLM, multimodal

View Paper

Export Archive Data

Browse by Date

Papers for October 11, 2025

ReSplat: Learning Recurrent Gaussian Splats

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

Who Said Neural Networks Aren't Linear?

Scalable Offline Metrics for Autonomous Driving

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction

How to Teach Large Multimodal Models New Skills

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints