Paper Archive

Browse and export your curated research paper collection

33
Archived Days
330
Total Papers
7.8
Avg Score
7
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for October 13, 2025

10 papers found

Unknown authors 10/13/2025 huggingface

machine learning

Project Page: https://kangliao929.github.io/projects/puffin/Github: https://github.com/KangLiao929/PuffinDataset: https://huggingface.co/datasets/KangLiao/Puffin-4MModel: https://huggingface.co/KangLiao/PuffinDemo: https://huggingface.co/spaces/KangLiao/Puffin\n","updatedAt":"202...

Keywords: language regression, diffusion-based generation, camera-centric, multimodal model, spatial awareness, vision-language, camera as language, geometric context

Unknown authors 10/13/2025 huggingface

machine learning

We present D2E 🎮→🤖, a framework that scales Vision-Action Pretraining on desktop interaction data to accelerate Embodied AI 🚀.By turning ordinary game and desktop interactions into training fuel, D2E builds rich visuomotor priors that transfer from screens to robots\n✨ OWA Toolkit 🖥️ — a unified...

Keywords: embodied AI, desktop pretraining, OWA Toolkit, OWAMcap, Generalist-IDM, timestamp-based prediction, pseudo-labeling, VAPT

Unknown authors 10/13/2025 huggingface

machine learning

We introduce the problem of multimodal prompt optimization and propose the multimodal prompt optimizer, to harness the full capacity of multimodal large language models beyond text.\n","updatedAt":"2025-10-13T02:26:45.067Z","author":{"_id":"64cfa0b97...

Keywords: Multimodal Prompt Optimization, MPO, MLLMs, prompt optimization, alignment-preserving updates, Bayesian selection, multimodal learning

Unknown authors 10/13/2025 huggingface

reinforcement learning

1T).\r\nOur Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels!\r\n\r\nAll codes and datasets are open-source!\r\n\r\nHF🤗: https://huggingface.co/datasets/Salesforce/Webscale-RL\r\n\r\nGithub 🤖: https://github.com/SalesforceAIResearch/Pr...

Keywords: reinforcement learning, large language models, data pipeline, Webscale-RL, dataset, pretraining, question-answering, efficiency

Unknown authors 10/13/2025 huggingface

machine learning

Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek-R1) have led to remarkable improvements through long Chain-of-Thought (CoT). However, existing benchmarks mainly focus on immediate, single-horizon tasks, failing to adequately evaluate models' ability to understand a...

Keywords: R-HORIZON, long-horizon reasoning, Chain-of-Thought, CoT, Large Reasoning Models, LRMs, query composition, RLVR

Unknown authors 10/13/2025 huggingface

machine learning

With the current surge in spatial reasoning explorations, researchers have made significant progress in understanding indoor scenes, but still struggle with diverse applications such as robotics and autonomous driving. This paper aims to advance all-scale spatial reasoning across diverse scenarios b...

Keywords: all-scale spatial reasoning, SpaceVista-1M, SpaceVista-7B, scale-aware modeling, progressive training, spatial QA, multimodal LLMs, benchmark

Unknown authors 10/13/2025 huggingface

machine learning

Paper(arXiv): https://arxiv.org/abs/2510.08457Github: https://github.com/shawn0728/ARESModel & Dataset (hugging face): https://huggingface.co/collections/ares0728/ares-68e7c7160dcb48734dee4e95\n","updatedAt":"2025-10-13T03:57:04.248Z","author":{"_id&qu...

Keywords: multimodal large reasoning models, high window-entropy, adaptive reasoning, AEPO, difficulty-aware, entropy shaping, dynamic KL, Adaptive Cold-Start

Unknown authors 10/13/2025 huggingface

machine learning

StreamingVLM enables real-time, stable understanding of effectively infinite video by keeping a compact KV cache and aligning training with streaming inference. It avoids quadratic cost and sliding-window pitfalls, runs up to 8 FPS on a single H100, and wins 66.18% vs GPT-4o mini on a new long-video...

Keywords: StreamingVLM, vision-language, KV cache, supervised fine-tuning, long-video, real-time, Inf-Streams-Eval, H100

Unknown authors 10/13/2025 huggingface

computer vision

Recent diffusion models achieve the state-of-the-art performance in image generation, but often suffer from semantic inconsistencies or hallucinations. While various inference-time guidance methods can enhance generation, they often operate indirectly by relying on external signals or architectural ...

Keywords: diffusion models, image generation, hallucinations, inference-time guidance, trajectory signals, tangential components, Taylor expansion, plug-and-play

Unknown authors 10/13/2025 huggingface

machine learning

🧐 Why AutoPR?The academic community continues to expand output each year without a corresponding increase in visibility or value. In 2024 alone, NeurIPS accepted over 4,000 papers, with conference volumes at CVPR and ICCV also soaring. In such an environment of overwhelming information, how can ind...

Keywords: AutoPR, PRAgent, PRBench, multimodal benchmark, content extraction, multi-agent system, platform adaptation, hierarchical summarization
Loading...

Preparing your export...