Paper Archive

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

machine learning

In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applicatio...

Keywords: Human-Object Interaction, video generation, multimodal, Unified Channel-wise Conditioning, Gated Local-Context Attention, Decoupled-Then-Joint Training, model merging, HOIVG-Bench

View Paper

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

computer vision

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity tha...

Keywords: GUI agents, reinforcement learning, sim2real, evaluation, deployment, mobile, GiGPO, Process Reward Model

View Paper

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

machine learning

Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains u...

Keywords: General reasoning, LLM evaluation, Benchmark, General365, K-12 restriction, Dataset, Robustness, Reasoning generalization

View Paper

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

robotics

While the shortage of explicit action data limits Vision-Language-Action (VLA) models, human action videos offer a scalable yet unlabeled data source. A critical challenge in utilizing large-scale human video datasets lies in transforming visual signals into ontology-independent representations, kno...

Keywords: latent action representation, LARY, vision-to-action, vision-language-action, benchmark, foundation models, robotic control, dataset

View Paper

Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer

0

9.0/10

[object Object] 4/13/2026 huggingface

natural language processing

AI-generated text has become common in academic and professional writing, prompting research into detection methods. Less studied is the reverse: systematically rewriting AI-generated prose to read as genuinely human-authored. We build a parallel corpus of 25,140 paired AI-input and human-reference ...

Keywords: AI-to-human style transfer, encoder-decoder, decoder-only, BART-large, Mistral-7B-Instruct, QLoRA, parallel corpus, stylistic markers

View Paper

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

machine learning

As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. ...

Keywords: Theory of Mind, belief steering, LLMs, reinforcement learning, privacy, adversarial dialogue, AI Double Agent

View Paper

CodeTracer: Towards Traceable Agent States

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

machine learning

Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage workflows over complex tasks, making the agent's state transitions and error propagation hard to observe. In these runs, an early misstep can trap t...

Keywords: CodeTracer, traceability, failure localization, code agents, benchmark, CodeTraceBench, trace tree, persistent memory

View Paper

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

computer vision

Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into act...

Keywords: reward models, rationales, PARROT, RationalRewards, Generate-Critique-Refine, preference prediction, text-to-image, image-editing

View Paper

Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

machine learning

While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tu...

Keywords: MedSSR, semi-supervised RL, data synthesis, rare diseases, pseudo-labeling, medical reasoning, chain-of-thought, Qwen

View Paper

Continuous Adversarial Flow Models

0

9.0/10

[object Object], [object Object], [object Object], [object Object], [object Object] 4/13/2026 huggingface

generative models

We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a diff...

Keywords: continuous-time flow, adversarial training, flow matching, discriminator, post-training, ImageNet, FID, text-to-image

View Paper

Export Archive Data

Browse by Date

Papers for April 14, 2026

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

CodeTracer: Towards Traceable Agent States

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

Continuous Adversarial Flow Models