Paper Archive

Browse and export your curated research paper collection

79
Archived Days
788
Total Papers
8.4
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for November 26, 2025

10 papers found

Unknown authors 11/26/2025 huggingface

machine learning

Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmenta...

Keywords: MedSAM-3, Segment Anything Model, Promptable Concept Segmentation, medical image segmentation, multimodal LLM, agent-in-the-loop, open-vocabulary

Unknown authors 11/26/2025 huggingface

machine learning

Recent years have witnessed significant progress in Unified Multimodal Models, yet a fundamental question remains: Does understanding truly inform generation? To investigate this, we introduce UniSandbox, a decoupled evaluation framework paired with controlled, synthetic datasets to avoid data leaka...

Keywords: Unified Multimodal Models, UniSandbox, understanding-generation gap, Chain-of-Thought, self-training, knowledge transfer, query-based architectures, synthetic datasets

Unknown authors 11/26/2025 huggingface

machine learning

World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: Giga...

Keywords: world models, embodied AI, Vision-Language-Action, video generation, 3D Gaussian Splatting, differentiable system identification, motion planning, GigaTrain

Unknown authors 11/26/2025 huggingface

computer vision

Preserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Video (R2V) paradigm overlooks critical spatio-temporal misalignments common in real-world applications, leading...

Keywords: human image animation, image-to-video, first-frame preservation, condition reconciliation, pose modulation, temporal coherence, generative models, computer vision

Unknown authors 11/26/2025 huggingface

computer vision

Despite advances, video diffusion transformers still struggle to generalize beyond their training length, a challenge we term video length extrapolation. We identify two failure modes: model-specific periodic content repetition and a universal quality degradation. Prior works attempt to solve repeti...

Keywords: video diffusion, transformer, extrapolation, attention dispersion, positional encoding, UltraViCo, training-free, video synthesis

Unknown authors 11/26/2025 huggingface

computer vision

We present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions of the input video and the target retake. Moreover, we introdu...

Keywords: ReDirector, Rotary Camera Encoding, RoCE, RoPE, camera-controlled retake, spatiotemporal alignment, multi-view, out-of-distribution generalization

Unknown authors 11/26/2025 huggingface

computer vision

This paper studies Visual Question-Visual Answering (VQ-VA): generating an image, rather than text, in response to a visual question -- an ability that has recently emerged in proprietary systems such as NanoBanana and GPT-Image. To also bring this capability to open-source models, we introduce VQ-V...

Keywords: VQ-VA, VQ-VA World, IntelligentBench, LightFusion, agentic pipeline, image generation, visual question answering, dataset construction

Unknown authors 11/26/2025 huggingface

reinforcement learning

Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often exhibit high variance-a phenomenon exacerbated in Mixture-o...

Keywords: SAPO, soft gating, policy optimization, reinforcement learning, LLM fine-tuning, Mixture-of-Experts, GSPO, GRPO

Unknown authors 11/26/2025 huggingface

generative models

Pre-trained video models learn powerful priors for generating high-quality, temporally coherent content. While these models excel at temporal coherence, their dynamics are often constrained by the continuous nature of their training data. We hypothesize that by injecting the rich and unconstrained c...

Keywords: iMontage, video models, image generation, many-to-many, motion priors, temporal coherence, data curation, model adaptation

Unknown authors 11/26/2025 huggingface

machine learning

This paper presents research on agent0-vl:, exploring, self-evolving. The full abstract is not available at this time. Please visit the paper's website for complete details about the methodology, results, and contributions.

Keywords: Agent0-VL, self-evolving, tool-integrated, vision-language, multi-modal agents
Loading...

Preparing your export...