Paper Archive

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

0

9.0/10

Unknown authors 12/3/2025 huggingface

machine learning

We present MG-Nav (Memory-Guided Navigation), a dual-scale framework for zero-shot visual navigation that unifies global memory-guided planning with local geometry-enhanced control. At its core is the Sparse Spatial Memory Graph (SMG), a compact, region-centric memory where each node aggregates mult...

Keywords: MG-Nav, Sparse Spatial Memory Graph, SMG, VGGT-adapter, zero-shot navigation, global planning, local control, image-goal

View Paper

SimScale: Learning to Drive via Real-World Simulation at Scale

0

9.0/10

Unknown authors 12/3/2025 huggingface

machine learning

Achieving fully autonomous driving systems requires learning rational decisions in a wide span of scenarios, including safety-critical and out-of-distribution ones. However, such cases are underrepresented in real-world corpus collected by human experts. To complement for the lack of data diversity,...

Keywords: simulation, neural rendering, pseudo-expert, co-training, autonomous driving, data synthesis, generalization, robustness

View Paper

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

0

9.0/10

Unknown authors 12/3/2025 huggingface

computer vision

Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coherent narrative, and controllability beyond text prompts. To tackle these challenges, we propose MultiShotMaster, a framework for highly con...

Keywords: multi-shot video, RoPE, Multi-Shot Narrative RoPE, Spatiotemporal Position-Aware RoPE, video generation, reference grounding, data annotation pipeline, controllable generation

View Paper

Guided Self-Evolving LLMs with Minimal Human Supervision

0

9.0/10

Unknown authors 12/3/2025 huggingface

machine learning

AI self-evolution has long been envisioned as a path toward superintelligence, where models autonomously acquire, refine, and internalize knowledge from their own learning experiences. Yet in practice, unguided self-evolving systems often plateau quickly or even degrade as training progresses. These...

Keywords: self-evolving, R-Few, Challenger-Solver, in-context grounding, mixed training, curriculum learning, concept drift, synthetic data

View Paper

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

0

9.0/10

Unknown authors 12/3/2025 huggingface

computer vision

Recent advances in video large language models have demonstrated strong capabilities in understanding short clips. However, scaling them to hours- or days-long videos remains highly challenging due to limited context capacity and the loss of critical visual details during ion. Existing memory-augmen...

Keywords: multimodal memory, long video reasoning, episodic memory, semantic memory, visual memory, adaptive retrieval, temporal granularity, video QA

View Paper

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

0

9.0/10

Unknown authors 12/3/2025 huggingface

machine learning

Despite progress in video-to-audio generation, the field focuses predominantly on mono output, lacking spatial immersion. Existing binaural approaches remain constrained by a two-stage pipeline that first generates mono audio and then performs spatialization, often resulting in error accumulation an...

Keywords: binaural audio, video-to-audio, spatial audio, conditional flow matching, dual-branch architecture, BiAudio dataset, end-to-end, multimodal

View Paper

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

0

9.0/10

Unknown authors 12/3/2025 huggingface

computer vision

This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene understanding and geometric awareness. DualCamCtrl specifically...

Keywords: diffusion_model, camera_controlled_video, depth_estimation, RGB-depth_fusion, semantic_alignment, geometry_aware, SIGMA, dual_branch

View Paper

Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

0

8.0/10

Unknown authors 12/3/2025 huggingface

machine learning

Recent audio-video generative systems suggest that coupling modalities benefits not only audio-video synchrony but also the video modality itself. We pose a fundamental question: Does audio-video joint denoising training improve video generation, even when we only care about video quality? To study ...

Keywords: audio-video joint denoising, AVFullDiT, text-to-video, text-to-audio, multimodal learning, video generation, privileged signal, audio-visual causality

View Paper

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

0

5.0/10

Unknown authors 12/3/2025 huggingface

natural language processing

This paper presents research on deepseek-v3.2:, pushing, frontier. The full abstract is not available at this time. Please visit the paper's website for complete details about the methodology, results, and contributions.

Keywords: DeepSeek-V3.2, open large language models, open LLMs, large language models, paper metadata missing

View Paper

Mixture of Horizons in Action Chunking

0

5.0/10

Unknown authors 12/3/2025 huggingface

computer vision

This paper presents research on mixture, horizons, action. The full abstract is not available at this time. Please visit the paper's website for complete details about the methodology, results, and contributions.

Keywords: Mixture of Horizons, Action Chunking, temporal modeling, mixture models

View Paper

Export Archive Data

Browse by Date

Papers for December 3, 2025

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

SimScale: Learning to Drive via Real-World Simulation at Scale

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Guided Self-Evolving LLMs with Minimal Human Supervision

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Mixture of Horizons in Action Chunking