Paper Archive

Attention-ResUNet for Automated Fetal Head Segmentation

0

9.0/10

[object Object], [object Object] 4/20/2026 huggingface

machine learning

Automated fetal head segmentation in ultrasound images is critical for accurate biometric measurements in prenatal care. While existing deep learning approaches have achieved a reasonable performance, they struggle with issues like low contrast, noise, and complex anatomical boundaries which are inh...

Keywords: fetal head segmentation, Attention-ResUNet, residual learning, attention gates, HC18, Dice score, ultrasound, medical imaging

View Paper

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

machine learning

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level...

Keywords: MathNet, multimodal, multilingual, Olympiad, benchmark, dataset, math-retrieval, retrieval-augmented

View Paper

When Can LLMs Learn to Reason with Weak Supervision?

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

machine learning

Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under...

Keywords: LLMs, weak supervision, reinforcement learning, RLVR, reasoning faithfulness, reward saturation, supervised fine-tuning, continual pre-training

View Paper

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

0

9.0/10

[object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

computer vision

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing app...

Keywords: MultiWorld, video world models, multi-agent, multi-view, Multi-Agent Condition Module, Global State Encoder, simulation, robotics

View Paper

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

generative models

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance g...

Keywords: UDM, GRPO, discrete diffusion, reinforcement learning, text-to-image, OCR, trajectory reconstruction, Reduced-Step

View Paper

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

machine learning

Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into contin...

Keywords: OneVL, latent CoT, chain-of-thought, vision-language, world model, autonomous driving, latent tokens, visual world model

View Paper

OpenGame: Open Agentic Coding for Games

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

machine learning

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks wi...

Keywords: agentic_agents, game_development, code_LLM, GameCoder-27B, Game Skill, OpenGame-Bench, execution-grounded_RL, VLM_judging

View Paper

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

reinforcement learning

Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust...

Keywords: Agent-World, Agentic Environment-Task Discovery, Continuous Self-Evolving Training, multi-environment RL, Model Context Protocol (MCP), environment synthesis, co-evolution, lifelong learning

View Paper

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/20/2026 huggingface

machine learning

Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and...

Keywords: WebCompass, multimodal, web coding, code LMs, Agent-as-a-Judge, Model Context Protocol, editing, repair

View Paper

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs

0

9.0/10

[object Object], [object Object], [object Object] 4/20/2026 huggingface

machine learning

Multimodal LLMs can accurately perceive numerical content across modalities yet fail to perform exact multi-digit multiplication when the identical underlying arithmetic problem is presented as numerals, number words, images, or in audio form. Because existing benchmarks often lack systematically pa...

Keywords: multimodal LLMs, arithmetic load, multiplication benchmark, perception vs computation, forced-completion probe, decomposition heuristic, LoRA adapters, R² predictive

View Paper

Export Archive Data

Browse by Date

Papers for April 21, 2026

Attention-ResUNet for Automated Fetal Head Segmentation

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

When Can LLMs Learn to Reason with Weak Supervision?

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

OpenGame: Open Agentic Coding for Games

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs