Browse and export your curated research paper collection
Unknown authors 11/26/2025 huggingface
machine learningMedical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmenta...
Unknown authors 11/26/2025 huggingface
machine learningRecent years have witnessed significant progress in Unified Multimodal Models, yet a fundamental question remains: Does understanding truly inform generation? To investigate this, we introduce UniSandbox, a decoupled evaluation framework paired with controlled, synthetic datasets to avoid data leaka...
Unknown authors 11/26/2025 huggingface
machine learningWorld models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: Giga...
Unknown authors 11/26/2025 huggingface
computer visionPreserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Video (R2V) paradigm overlooks critical spatio-temporal misalignments common in real-world applications, leading...
Unknown authors 11/26/2025 huggingface
computer visionDespite advances, video diffusion transformers still struggle to generalize beyond their training length, a challenge we term video length extrapolation. We identify two failure modes: model-specific periodic content repetition and a universal quality degradation. Prior works attempt to solve repeti...
Unknown authors 11/26/2025 huggingface
computer visionWe present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions of the input video and the target retake. Moreover, we introdu...
Unknown authors 11/26/2025 huggingface
computer visionThis paper studies Visual Question-Visual Answering (VQ-VA): generating an image, rather than text, in response to a visual question -- an ability that has recently emerged in proprietary systems such as NanoBanana and GPT-Image. To also bring this capability to open-source models, we introduce VQ-V...
Unknown authors 11/26/2025 huggingface
reinforcement learningReinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often exhibit high variance-a phenomenon exacerbated in Mixture-o...
Unknown authors 11/26/2025 huggingface
generative modelsPre-trained video models learn powerful priors for generating high-quality, temporally coherent content. While these models excel at temporal coherence, their dynamics are often constrained by the continuous nature of their training data. We hypothesize that by injecting the rich and unconstrained c...
Unknown authors 11/26/2025 huggingface
machine learningThis paper presents research on agent0-vl:, exploring, self-evolving. The full abstract is not available at this time. Please visit the paper's website for complete details about the methodology, results, and contributions.
Preparing your export...