Paper Archive

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained a...

Keywords: gpt

View Paper

Geo-Align: Video Generation Alignment via Metric Geometry Reward

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Camera-controlled video generation has achieved remarkable progress in recent years. However, existing video-to-video re-rendering methods primarily rely on Supervised Fine-Tuning using synthetic datasets. At present, there is an extreme scarcity of synchronized, multi-view real-world video data. Co...

Keywords: reinforcement learning, fine-tuning

View Paper

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encod...

View Paper

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

natural language processing

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scal...

Keywords: fine-tuning

View Paper

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

natural language processing

Language agents increasingly improve by reusing skills -- structured procedural artifacts distilled from past experience. In particular, domain-level and model-generated skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and ...

View Paper

ETCHR: Editing To Clarify and Harness Reasoning

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fi...

Keywords: fine-tuning

View Paper

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers insi...

Keywords: transformer, attention

View Paper

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile th...

View Paper

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

0

5.0/10

[object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or deco...

View Paper

PhotoFlow: Agentic 3D Virtual Photography Missions

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes...

View Paper

Export Archive Data

Browse by Date

Papers for May 25, 2026

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Geo-Align: Video Generation Alignment via Metric Geometry Reward

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

ETCHR: Editing To Clarify and Harness Reasoning

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

PhotoFlow: Agentic 3D Virtual Photography Missions