Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for March 25, 2026

10 papers found

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

computer vision

Optical flow models trained on high-quality data often degrade severely when confronted with real-world corruptions such as blur, noise, and compression artifacts. To overcome this limitation, we formulate Degradation-Aware Optical Flow, a new task targeting accurate dense correspondence estimation ...

Keywords: optical_flow, diffusion_models, degradation_robustness, image_restoration, spatio-temporal_attention, DA-Flow, zero-shot_correspondence

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

computer vision

Dynamical systems theory and reinforcement learning view world evolution as latent-state dynamics driven by actions, with visual observations providing partial information about the state. Recent video world models attempt to learn this action-conditioned dynamics from data. However, existing datase...

Keywords: WildWorld, WildBench, action-conditioned world modeling, dataset, Monster Hunter: Wilds, explicit state annotations, skeletons, depth maps

[object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

computer vision

Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric...

Keywords: novel view synthesis, monocular training, masked loss, monocular depth, unpaired images, zero-shot, OVIE, 3D lifting

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

machine learning

Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascaded perception, reasoning, and tool-calling loops introduce significant sequential overhead. This overhea...

Keywords: SpecEyes, speculative planning, cognitive gating, answer separability, heterogeneous parallel funnel, agentic depth, multimodal LLMs, latency

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

robotics

Video-Action Models (VAMs) have emerged as a promising framework for embodied intelligence, learning implicit world dynamics from raw video streams to produce temporally consistent action predictions. Although such models demonstrate strong performance on long-horizon tasks through visual reasoning,...

Keywords: VTAM, Video-Action Models, tactile perception, video transformer, multimodal fusion, modality transfer finetuning, tactile regularization, contact-rich manipulation

[object Object], [object Object] 3/24/2026 huggingface

machine learning

Functionality segmentation in 3D scenes requires an agent to ground implicit natural-language instructions into precise masks of fine-grained interactive elements. Existing methods rely on fragmented pipelines that suffer from visual blindness during initial task parsing. We observe that these metho...

Keywords: functionality segmentation, 3D scenes, multimodal LLM, spatial-temporal grounding, coarse-to-fine, training-free, SceneFun3D, mIoU

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

computer vision

State-of-the-art video generation models produce remarkable photorealism, but they lack the precise control required to align generated content with specific scene requirements. Furthermore, without an underlying explicit geometry, these models cannot guarantee 3D consistency. Conversely, 3D engines...

Keywords: video diffusion, sim-to-real, anchor-based propagation, IC-LoRA, photorealism, 3D consistency, GTA-V, render-to-video

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

machine learning

High-quality articulated 3D assets are indispensable for embodied AI and physical simulation, yet 3D generation still focuses on static meshes, leaving a gap in "sim-ready" interactive objects. Most recent articulated object creation methods rely on multi-stage pipelines that accumulate errors acros...

Keywords: SIMART, Sparse 3D VQ-VAE, MLLM, articulated objects, PartNet-Mobility, 3D tokenization, simulation, robotics

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

robotics

Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as object penetration and anti-gravity motion - due to training on generic visual data and likelihood-based objectives that ...

Keywords: ABot-PhysWorld, diffusion transformer, world model, physics alignment, DPO post-training, decoupled discriminators, parallel context block, action-controllable video

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 3/24/2026 huggingface

computer vision

Existing camouflage object detection (COD) methods typically rely on fully-supervised learning guided by mask annotations. However, obtaining mask annotations is time-consuming and labor-intensive. Compared to fully-supervised methods, existing weakly-supervised COD methods exhibit significantly poo...

Keywords: camouflaged object detection, weakly supervised learning, Segment Anything Model, SAM, frequency-aware learning, contrastive learning, low-rank adaptation, FoRA
Loading...

Preparing your export...