Paper Archive

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

natural language processing

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, l...

View Paper

Normalizing Trajectory Models

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

computer vision

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, or adversarial objectives, but sacrifice ...

View Paper

Flow-OPD: On-Policy Distillation for Flow Matching Models

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

computer vision

Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, which together give rise to a 'seesaw effect...

Keywords: fine-tuning

View Paper

Fast Byte Latent Transformer

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

natural language processing

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generati...

Keywords: transformer

View Paper

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

computer vision

While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requirements as semantic commitments and formalize thei...

View Paper

STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

computer vision

Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generate interleaved text-image sequences. Most existing approaches combine autoregressive language modeling with diffusion-based image generators, inherit...

Keywords: transformer

View Paper

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

computer vision

Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leaving unclear what kind of latent space is...

Keywords: diffusion model

View Paper

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

computer vision

Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typically compress visual tokens via visual s...

View Paper

Anisotropic Modality Align

0

5.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 5/8/2026 huggingface

natural language processing

Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unim...

View Paper

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

0

5.0/10

[object Object], [object Object], [object Object] 5/8/2026 huggingface

machine learning

The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics governing hidden-state drift along the d...

Keywords: attention

View Paper

Export Archive Data

Browse by Date

Papers for May 11, 2026

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Normalizing Trajectory Models

Flow-OPD: On-Policy Distillation for Flow Matching Models

Fast Byte Latent Transformer

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

Anisotropic Modality Align

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics