Paper Archive

Is This Tracker On? A Benchmark Protocol for Dynamic Tracking

0

9.0/10

Ilona Demler, Saumya Chauhan, Georgia Gkioxari 10/22/2025 arxiv

computer vision

We introduce ITTO, a challenging new benchmark suite for evaluating and diagnosing the capabilities and limitations of point tracking methods. Our videos are sourced from existing datasets and egocentric real-world recordings, with high-quality human annotations collected through a multi-stage pipel...

Keywords: ITTO, point tracking, benchmark, occlusion, egocentric, re-identification, dataset, motion complexity

View Paper

Semantic World Models

0

9.0/10

Jacob Berg, Chuning Zhu, Yanda Bao, Ishan Durugkar, Abhishek Gupta 10/22/2025 arxiv

robotics

Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective of predicting future pixels is often at odds with the actua...

Keywords: semantic world models, vision-language models, visual question answering, planning, robotics, policy improvement, generalization, action-conditional modeling

View Paper

olmOCR 2: Unit Test Rewards for Document OCR

0

9.0/10

Jake Poznanski, Luca Soldaini, Kyle Lo 10/22/2025 arxiv

machine learning

We present olmOCR 2, the latest in our family of powerful OCR systems for converting digitized print documents, like PDFs, into clean, naturally ordered plain text. olmOCR 2 is powered by olmOCR-2-7B-1025, a specialized, 7B vision language model (VLM) trained using reinforcement learning with verifi...

Keywords: OCR, vision-language model, reinforcement learning, unit tests, synthetic data, document parsing, tables, math OCR

View Paper

How to Evaluate Monocular Depth Estimation?

0

9.0/10

Siyang Wu, Jack Nugent, Willow Yang, Jia Deng 10/22/2025 arxiv

computer vision

Monocular depth estimation is an important task with rapid progress, but how to evaluate it remains an open question, as evidenced by a lack of standardization in existing literature and a large selection of evaluation metrics whose trade-offs and behaviors are not well understood. This paper contri...

Keywords: monocular depth estimation, evaluation metrics, sensitivity analysis, relative surface normals, human judgment, depth visualization, composite metrics, curvature

View Paper

Hubble: a Model Suite to Advance the Study of LLM Memorization

0

9.0/10

Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia 10/22/2025 arxiv

machine learning

We present Hubble, a suite of fully open-source large language models (LLMs) for the scientific study of LLM memorization. Hubble models come in standard and perturbed variants: standard models are pretrained on a large English corpus, and perturbed models are trained in the same way but with contro...

Keywords: LLM memorization, data privacy, membership inference, machine unlearning, open-source models, training dynamics, dataset perturbation

View Paper

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

0

9.0/10

Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan 10/22/2025 arxiv

machine learning

Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible...

Keywords: Pico-Banana-400K, image editing, dataset, Nano-Banana, OpenImages, MLLM quality scoring, multi-turn edits, alignment

View Paper

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning

0

9.0/10

Xichen Zhang, Sitong Wu, Yinghao Zhu, Haoru Tan, Shaozuo Yu, Ziyi He, Jiaya Jia 10/22/2025 arxiv

reinforcement learning

Reinforcement learning from verifiable rewards has emerged as a powerful technique for enhancing the complex reasoning abilities of Large Language Models (LLMs). However, these methods are fundamentally constrained by the ''learning cliff'' phenomenon: when faced with problems far beyond their curre...

Keywords: Scaf-GRPO, GRPO, scaffolding, learning cliff, in-prompt hints, LLM reasoning, reinforcement learning from verifiable rewards, Qwen2.5-Math-7B

View Paper

The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

0

9.0/10

David Mora, Viraat Aryabumi, Wei-Yin Ko, Sara Hooker, Julia Kreutzer, Marzieh Fadaee 10/22/2025 arxiv

machine learning

Synthetic data has become a cornerstone for scaling large language models, yet its multilingual use remains bottlenecked by translation-based prompts. This strategy inherits English-centric framing and style and neglects cultural dimensions, ultimately constraining model generalization. We argue tha...

Keywords: multilingual, synthetic data, prompt optimization, cultural adaptation, difficulty enhancement, Global-MMLU, Flores XCometXL, mArenaHard

View Paper

The Feasibility of Training Sovereign Language Models in the Global South: A Study of Brazil and Mexico

0

9.0/10

Sandra Malagon, Monica A. Ulloa Ruiz, Tatiana Elizabeth Sandoval Plaza, Gabriel Rafael Rosario Bolívar, Valentina García Mesa, Ivanna Alvarado Morales 10/22/2025 arxiv

machine learning

The rapid escalation of computational requirements for training large-scale language models has reinforced structural asymmetries between high-capacity jurisdictions and countries in the Global South. This paper examines the technical and fiscal feasibility of sovereign-scale language model training...

Keywords: sovereign language models, Brazil, Mexico, H100, A100, compute governance, energy consumption, fiscal feasibility

View Paper

Transformers are almost optimal metalearners for linear classification

0

9.0/10

Roey Magen, Gal Vardi 10/22/2025 arxiv

machine learning

Transformers have demonstrated impressive in-context learning (ICL) capabilities, raising the question of whether they can serve as metalearners that adapt to new tasks using only a small number of in-context examples, without any further training. While recent theoretical work has studied transform...

Keywords: Transformers, In-context learning, Metalearning, Sample complexity, Gaussian mixture, Subspace, Gradient descent, Linear classification

View Paper

Export Archive Data

Browse by Date

Papers for October 23, 2025

Is This Tracker On? A Benchmark Protocol for Dynamic Tracking

Semantic World Models

olmOCR 2: Unit Test Rewards for Document OCR

How to Evaluate Monocular Depth Estimation?

Hubble: a Model Suite to Advance the Study of LLM Memorization

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning

The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

The Feasibility of Training Sovereign Language Models in the Global South: A Study of Brazil and Mexico

Transformers are almost optimal metalearners for linear classification