Paper Archive

Browse and export your curated research paper collection

79
Archived Days
788
Total Papers
8.4
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for November 24, 2025

10 papers found

Jun Cen, Siteng Huang, Yuqian Yuan, Hangjie Yuan, Chaohui Yu, Yuming Jiang, Jiayan Guo, Kehan Li, Hao Luo, Fan Wang, Xin Li, Deli Zhao, Hao Chen 11/21/2025 arxiv

robotics

We introduce RynnVLA-002, a unified Vision-Language-Action (VLA) and world model. The world model leverages action and visual inputs to predict future image states, learning the underlying physics of the environment to refine action generation. Conversely, the VLA model produces subsequent actions f...

Keywords: vision-language-action, world model, robotics, joint learning, simulation, sim2real, LIBERO, LeRobot

Yuezhan Tao, Dexter Ong, Fernando Cladera, Jason Hughes, Camillo J. Taylor, Pratik Chaudhari, Vijay Kumar 11/21/2025 arxiv

robotics

We demonstrate real-time high-altitude aerial metric-semantic mapping and exploration using a monocular camera paired with a global positioning system (GPS) and an inertial measurement unit (IMU). Our system, named HALO, addresses two key challenges: (i) real-time dense 3D reconstruction using visio...

Keywords: metric-semantic mapping, monocular vision, aerial exploration, language-conditioned planning, real-time 3D reconstruction, quadrotor, GPS/IMU, autonomous mapping

Weilun Li, Lei Sun, Ruixi Gao, Qi Jiang, Yuqin Ma, Kaiwei Wang, Ming-Hsuan Yang, Luc Van Gool, Danda Pani Paudel 11/21/2025 arxiv

computer vision

As neuromorphic sensors, event cameras asynchronously record changes in brightness as streams of sparse events with the advantages of high temporal resolution and high dynamic range. Reconstructing intensity images from events is a highly ill-posed task due to the inherent ambiguity of absolute brig...

Keywords: event cameras, diffusion models, video reconstruction, neuromorphic sensors, surrogate training, EvEncoder, high dynamic range

Yolo Yunlong Tang, Daiki Shimada, Hang Hua, Chao Huang, Jing Bi, Rogerio Feris, Chenliang Xu 11/21/2025 arxiv

computer vision

Understanding text-rich videos requires reading small, transient textual cues that often demand repeated inspection. Yet most video QA models rely on single-pass perception over fixed frames, leading to hallucinations and failures on fine-grained evidence. Inspired by how humans pause, zoom, and re-...

Keywords: video reasoning, text-rich video, visual rumination, large multimodal models, reinforcement learning, SFT, GRPO, Video-R4-CoT-17k

Vinay Kanakeri, Shivam Bajaj, Ashwin Verma, Vijay Gupta, Aritra Mitra 11/21/2025 arxiv

machine learning

It is known that reinforcement learning (RL) is data-hungry. To improve sample-efficiency of RL, it has been proposed that the learning algorithm utilize data from 'approximately similar' processes. However, since the process models are unknown, identifying which other processes are similar poses a ...

Keywords: LQR, reinforcement_learning, clustering, personalized_policies, policy_optimization, sequential_elimination, zeroth-order, collaborative_learning

Mark Endo, Serena Yeung-Levy 11/21/2025 arxiv

machine learning

Scaling up multimodal models has enabled remarkable advances in visual understanding and reasoning, but practical demands call for smaller, efficient systems. In this work, we conduct a principled analysis of downscaling intelligence in multimodal models, examining how reduced large language model (...

Keywords: multimodal models, LLM downscaling, visual perception, visual reasoning, visual extraction tuning, Extract+Think, efficiency, small models

Roozbeh Bazargani, Saqib Abdullah Basar, Daniel Daly-Grafstein, Rodrigo Solis Pompa, Soojin Lee, Saurabh Garg, Yuntong Ma, John A. Carrino, Siavash Khallaghi, Sam Hashemi 11/21/2025 arxiv

machine learning

The human spine is a complex structure composed of 33 vertebrae. It holds the body and is important for leading a healthy life. The spine is vulnerable to age-related degenerations that can be identified through magnetic resonance imaging (MRI). In this paper we propose a novel computer-vison-based ...

Keywords: spine age, spine aging, MRI, deep learning, UMAP, HDBSCAN, spine age gap, SAG

Yiqing Shen, Aiza Maksutova, Chenjia Li, Mathias Unberath 11/21/2025 arxiv

computer vision

World models learn to predict the temporal evolution of visual observations given a control signal, potentially enabling agents to reason about environments through forward simulation. Because of the focus on forward simulation, current world models generate predictions based on factual observations...

Keywords: counterfactual world models, digital twin, video diffusion, large language models, structured scene representation, interventions, forward simulation

Unknown authors 11/24/2025 huggingface

machine learning

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable r...

Keywords: multimodal reasoning, supervised fine-tuning, reinforcement learning, data curation, reproducibility, OpenMMReasoner, Qwen2.5-VL-7B-Instruct, benchmarks

Unknown authors 11/24/2025 huggingface

computer vision

Current research on agentic visual reasoning enables deep multimodal understanding but primarily focuses on image manipulation tools, leaving a gap toward more general-purpose agentic models. In this work, we revisit the geolocalization task, which requires not only nuanced visual grounding but also...

Keywords: geolocalization, agentic reasoning, multimodal, web-augmented, GeoVista, GeoBench, reinforcement learning, hierarchical reward
Loading...

Preparing your export...