Paper Archive

Seedance 2.0: Advancing Video Generation for World Complexity

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint genera...

Keywords: Seedance 2.0, audio-video generation, multimodal, video generation, joint generation, reference editing, low-latency, Seedance Fast

View Paper

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by the cost of geometric annotation. The self-evolving paradigm offers a promising path, but its reliance on model consensus to construct pseudo-label...

Keywords: SpatialEvo, Deterministic Geometric Environment, DGE, self-evolving, spatial reasoning, point clouds, camera poses, shared-parameter policy

View Paper

From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potential is fundamentally bounded by the base model's existing output distribution. Optimizing the marginal distribution P(y) in the Pre-train Spac...

Keywords: PreRL, DSRL, Negative Sample Reinforcement, P(y), P(y|x), LLM reasoning, policy reincarnation, reinforcement learning

View Paper

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

robotics

While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we p...

Keywords: HiVLA, hierarchical architecture, vision-language-action, VLM planner, visual grounding, Diffusion Transformer, flow-matching, cascaded cross-attention

View Paper

TIP: Token Importance in On-Policy Distillation

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importance are incomplete. We ask a direct question: which tokens carry the most useful learning signal in OPD...

Keywords: on-policy distillation, token importance, student entropy, teacher-student divergence, TIP, memory-efficient distillation, Qwen3, Llama

View Paper

Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

0

9.0/10

[object Object], [object Object] 4/15/2026 huggingface

computer vision

Feed-forward 3D reconstruction models are efficient but rigid: once trained, they perform inference in a zero-shot manner and cannot adapt to the test scene. As a result, visually plausible reconstructions often contain errors, particularly under occlusions, specularities, and ambiguous cues. To add...

Keywords: test-time adaptation, 3D reconstruction, self-supervision, cross-view consistency, LoRA, Depth Anything 3, VGGT, multi-view

View Paper

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

computer vision

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-spe...

Keywords: feed-forward 3D reconstruction, taxonomy, feature enhancement, geometry awareness, model efficiency, augmentation strategies, temporal-aware models, benchmarks

View Paper

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

MLLM-based GUI agents have demonstrated strong capabilities in complex user interface interaction tasks. However, long-horizon scenarios remain challenging, as these agents are burdened with tasks beyond their intrinsic capabilities, suffering from memory degradation, progress confusion, and math ha...

Keywords: GUI automation, tool use, policy optimization, memory decoupling, TIPO, multimodal LLM, long-horizon tasks, Retriever

View Paper

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation

0

9.0/10

[object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mir...

Keywords: vision-language models, sycophancy, gaslighting, brain alignment, fMRI, V1, V3, Natural Scenes Dataset

View Paper

ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution

0

9.0/10

[object Object], [object Object], [object Object], [object Object] 4/15/2026 huggingface

machine learning

Large Language Models (LLMs) enhance their problem-solving capability by utilizing external tools. However, in open-world scenarios with massive and evolving tool repositories, existing methods relying on static embedding retrieval or parameter memorization of tools struggle to align user intent wit...

Keywords: ToolOmni, agentic learning, proactive retrieval, grounded execution, Decoupled Multi-Objective GRPO, supervised fine-tuning, open-world tool use, LLMs

View Paper

Export Archive Data

Browse by Date

Papers for April 16, 2026

Seedance 2.0: Advancing Video Generation for World Complexity

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

TIP: Token Importance in On-Policy Distillation

Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation

ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution