Paper Archive

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

0

5.0/10

Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz, Yu-Chiang Frank Wang, Fu-En Yang 1/14/2026 arxiv

computer vision

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy ...

View Paper

Value-Aware Numerical Representations for Transformer Language Models

0

5.0/10

Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu 1/14/2026 arxiv

natural language processing

Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limitation is that numbers are processed as symbolic tokens whose embeddings do not explicitly encode nume...

Keywords: transformer

View Paper

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

0

5.0/10

Sicong Liu, Yanxian Huang, Mingwei Liu, Jiachi Chen, Ensheng Shi, Yuchi Ma, Hongyu Zhang, Yin Zhang, Yanlin Wang 1/14/2026 arxiv

natural language processing

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has significantly advanced code generation, though their efficiency is ...

Keywords: fine-tuning

View Paper

SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

0

5.0/10

Ruiqi Shen, Chang Liu, Henghui Ding 1/14/2026 arxiv

reinforcement learning

Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-level collective memory selection is suboptimal for complex multi-object scenarios, as it employs a synchronized...

Keywords: segmentation

View Paper

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

0

5.0/10

Tony Danjun Wang, Tolga Birdal, Nassir Navab, Lennart Bastian 1/14/2026 arxiv

machine learning

3D pose estimation from sparse multi-views is a critical task for numerous applications, including action recognition, sports analysis, and human-robot interaction. Optimization-based methods typically follow a two-stage pipeline, first detecting 2D keypoints in each view and then associating these ...

Keywords: detection

View Paper

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

0

5.0/10

Jieying Chen, Jeffrey Hu, Joan Lasenby, Ayush Tewari 1/14/2026 arxiv

computer vision

Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that requ...

Keywords: diffusion model

View Paper

Empathy Applicability Modeling for General Health Queries

0

5.0/10

Shan Randhawa, Agha Ali Raza, Kentaro Toyama, Julie Hui, Mustafa Naseem 1/14/2026 arxiv

natural language processing

LLMs are increasingly being integrated into clinical workflows, yet they often lack clinical empathy, an essential aspect of effective doctor-patient communication. Existing NLP frameworks focus on reactively labeling empathy in doctors' responses but offer limited support for anticipatory modeling ...

Keywords: gpt

View Paper

LLMs can Compress LLMs: Adaptive Pruning by Agents

0

5.0/10

Sai Varun Kodathala, Rakesh Vunnam 1/14/2026 arxiv

natural language processing

As Large Language Models (LLMs) continue to scale, post-training pruning has emerged as a promising approach to reduce computational costs while preserving performance. Existing methods such as SparseGPT and Wanda achieve high sparsity through layer-wise weight reconstruction or activation-aware mag...

Keywords: gpt

View Paper

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

0

5.0/10

Lisa Schneckenreiter, Sohvi Luukkonen, Lukas Friedrich, Daniel Kuhn, Günter Klambauer 1/14/2026 arxiv

machine learning

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive ...

View Paper

Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection

0

5.0/10

Tianyi Niu, Justin Chih-Yao Chen, Genta Indra Winata, Shi-Xiong Zhang, Supriyo Chakraborty, Sambit Sahu, Yue Zhang, Elias Stengel-Eskin, Mohit Bansal 1/14/2026 arxiv

natural language processing

Large Language Model (LLM) routers dynamically select optimal models for given inputs. Existing approaches typically assume access to ground-truth labeled data, which is often unavailable in practice, especially when user request distributions are heterogeneous and unknown. We introduce Routing with...

View Paper

Export Archive Data

Browse by Date

Papers for January 15, 2026

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Value-Aware Numerical Representations for Transformer Language Models

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Empathy Applicability Modeling for General Health Queries

LLMs can Compress LLMs: Adaptive Pruning by Agents

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection