Paper Archive

Browse and export your curated research paper collection

257
Archived Days
2553
Total Papers
7.7
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for May 25, 2026

10 papers found

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained a...

Keywords: gpt

[object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Camera-controlled video generation has achieved remarkable progress in recent years. However, existing video-to-video re-rendering methods primarily rely on Supervised Fine-Tuning using synthetic datasets. At present, there is an extreme scarcity of synchronized, multi-view real-world video data. Co...

Keywords: reinforcement learning, fine-tuning

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encod...

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

natural language processing

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scal...

Keywords: fine-tuning

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

natural language processing

Language agents increasingly improve by reusing skills -- structured procedural artifacts distilled from past experience. In particular, domain-level and model-generated skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and ...

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fi...

Keywords: fine-tuning

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers insi...

Keywords: transformer, attention

[object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile th...

[object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or deco...

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/22/2026 huggingface

computer vision

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes...

Loading...

Preparing your export...