Paper Archive

HippoCamp: Benchmarking Contextual Agents on Personal Computers

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. Unlike existing agent benchmarks that focus on tasks like web interaction, tool use, or software automation in generic settings, HippoCamp evaluates agents in user-centric environments to m...

Keywords: HippoCamp, multimodal agents, personal file systems, user profiling, evidence grounding, long-horizon retrieval, benchmarks, annotated trajectories

View Paper

Embarrassingly Simple Self-Distillation Improves Code Generation

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation config...

Keywords: self-distillation, code generation, LLM, Qwen3-30B, Llama, pass@1, LiveCodeBench v6, decoding

View Paper

Reasoning Shift: How Context Silently Shortens LLM Reasoning

0

9.0/10

[object Object] 4/1/2026 huggingface

natural language processing

Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning traces and self-verification, have demonstrated remarkable performance on complex, long-term reasoning tasks. However, the robustness of these reasoning behaviors remains underexplored. To investigate this...

Keywords: Reasoning Shift, LLM robustness, chain-of-thought, self-verification, context sensitivity, multi-turn dialogue, evaluation study, uncertainty management

View Paper

ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

We present ReinDriveGen, a framework that enables full controllability over dynamic driving scenes, allowing users to freely edit actor trajectories to simulate safety-critical corner cases such as front-vehicle collisions, drifting cars, vehicles spinning out of control, pedestrians jaywalking, and...

Keywords: ReinDriveGen, video diffusion, LiDAR, reinforcement learning, out-of-distribution, vehicle completion, 3D point cloud, autonomous driving

View Paper

Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment

0

9.0/10

[object Object], [object Object], [object Object] 4/1/2026 huggingface

computer vision

2D assembly diagrams are often abstract and hard to follow, creating a need for intelligent assistants that can monitor progress, detect errors, and provide step-by-step guidance. In mixed reality settings, such systems must recognize completed and ongoing steps from the camera feed and align them w...

Keywords: Vision-Language Models, cross-depiction, IKEA-Bench, assembly diagrams, ViT subspaces, mechanistic analysis, mixed reality

View Paper

PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding

0

9.0/10

[object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

computer vision

Document understanding and GUI interaction are among the highest-value applications of Vision-Language Models (VLMs), yet they impose exceptionally heavy computational burden: fine-grained text and small UI elements demand high-resolution inputs that produce tens of thousands of visual tokens. We ob...

Keywords: PixelPrune, predictive coding, token reduction, visual token pruning, ViT, vision-language, document understanding, GUI

View Paper

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

We present OmniVoice, a massive multilingual zero-shot text-to-speech (TTS) model that scales to over 600 languages. At its core is a novel diffusion language model-style discrete non-autoregressive (NAR) architecture. Unlike conventional discrete NAR models that suffer from performance bottlenecks ...

Keywords: OmniVoice, zero-shot TTS, diffusion language model, discrete non-autoregressive, full-codebook random masking, multi-codebook acoustic tokens, multilingual, 600+ languages

View Paper

UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention...

Keywords: UniMixer, UniMixing-Lite, TokenMixer, attention, factorization-machine, scaling laws, feature mixing, parameterized token mixing

View Paper

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

0

9.0/10

[object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

computer vision

3D Visual Grounding (3D-VG) aims to localize objects in 3D scenes via natural language descriptions. While recent advancements leveraging Vision-Language Models (VLMs) have explored zero-shot possibilities, they typically suffer from a static workflow relying on preprocessed 3D point clouds, essenti...

Keywords: 3D visual grounding, vision-language models, zero-shot, agentic framework, RGB-D, multi-view geometry, Semantic-Anchored Geometric Expansion, ScanRefer

View Paper

Do Agents Repair When Challenged -- or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

As large language model (LLM) agents are deployed in public interactive settings, a key question is whether their communities can sustain challenge, repair, and public correction, or merely produce norm-like language. We compare Moltbook, a live deployed agent forum, with five matched Reddit communi...

Keywords: Moltbook, Reddit, challenge-repair, public correction, threading, social alignment, LLM agents, safety

View Paper

Export Archive Data

Browse by Date

Papers for April 2, 2026

HippoCamp: Benchmarking Contextual Agents on Personal Computers

Embarrassingly Simple Self-Distillation Improves Code Generation

Reasoning Shift: How Context Silently Shortens LLM Reasoning

ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation

Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment

PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

Do Agents Repair When Challenged -- or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum