Paper Archive

Browse and export your curated research paper collection

197
Archived Days
1958
Total Papers
7.9
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for April 2, 2026

10 papers found

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. Unlike existing agent benchmarks that focus on tasks like web interaction, tool use, or software automation in generic settings, HippoCamp evaluates agents in user-centric environments to m...

Keywords: HippoCamp, multimodal agents, personal file systems, user profiling, evidence grounding, long-horizon retrieval, benchmarks, annotated trajectories

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation config...

Keywords: self-distillation, code generation, LLM, Qwen3-30B, Llama, pass@1, LiveCodeBench v6, decoding

[object Object] 4/1/2026 huggingface

natural language processing

Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning traces and self-verification, have demonstrated remarkable performance on complex, long-term reasoning tasks. However, the robustness of these reasoning behaviors remains underexplored. To investigate this...

Keywords: Reasoning Shift, LLM robustness, chain-of-thought, self-verification, context sensitivity, multi-turn dialogue, evaluation study, uncertainty management

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

We present ReinDriveGen, a framework that enables full controllability over dynamic driving scenes, allowing users to freely edit actor trajectories to simulate safety-critical corner cases such as front-vehicle collisions, drifting cars, vehicles spinning out of control, pedestrians jaywalking, and...

Keywords: ReinDriveGen, video diffusion, LiDAR, reinforcement learning, out-of-distribution, vehicle completion, 3D point cloud, autonomous driving

[object Object], [object Object], [object Object] 4/1/2026 huggingface

computer vision

2D assembly diagrams are often abstract and hard to follow, creating a need for intelligent assistants that can monitor progress, detect errors, and provide step-by-step guidance. In mixed reality settings, such systems must recognize completed and ongoing steps from the camera feed and align them w...

Keywords: Vision-Language Models, cross-depiction, IKEA-Bench, assembly diagrams, ViT subspaces, mechanistic analysis, mixed reality

[object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

computer vision

Document understanding and GUI interaction are among the highest-value applications of Vision-Language Models (VLMs), yet they impose exceptionally heavy computational burden: fine-grained text and small UI elements demand high-resolution inputs that produce tens of thousands of visual tokens. We ob...

Keywords: PixelPrune, predictive coding, token reduction, visual token pruning, ViT, vision-language, document understanding, GUI

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

We present OmniVoice, a massive multilingual zero-shot text-to-speech (TTS) model that scales to over 600 languages. At its core is a novel diffusion language model-style discrete non-autoregressive (NAR) architecture. Unlike conventional discrete NAR models that suffer from performance bottlenecks ...

Keywords: OmniVoice, zero-shot TTS, diffusion language model, discrete non-autoregressive, full-codebook random masking, multi-codebook acoustic tokens, multilingual, 600+ languages

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention...

Keywords: UniMixer, UniMixing-Lite, TokenMixer, attention, factorization-machine, scaling laws, feature mixing, parameterized token mixing

[object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

computer vision

3D Visual Grounding (3D-VG) aims to localize objects in 3D scenes via natural language descriptions. While recent advancements leveraging Vision-Language Models (VLMs) have explored zero-shot possibilities, they typically suffer from a static workflow relying on preprocessed 3D point clouds, essenti...

Keywords: 3D visual grounding, vision-language models, zero-shot, agentic framework, RGB-D, multi-view geometry, Semantic-Anchored Geometric Expansion, ScanRefer

[object Object], [object Object], [object Object], [object Object], [object Object] 4/1/2026 huggingface

machine learning

As large language model (LLM) agents are deployed in public interactive settings, a key question is whether their communities can sustain challenge, repair, and public correction, or merely produce norm-like language. We compare Moltbook, a live deployed agent forum, with five matched Reddit communi...

Keywords: Moltbook, Reddit, challenge-repair, public correction, threading, social alignment, LLM agents, safety
Loading...

Preparing your export...