Paper Archive

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

0

5.0/10

Hanrong Ye, Chao-Han Huck Yang, Arushi Goel, Wei Huang, Ligeng Zhu, Yuanhang Su, Sean Lin, An-Chieh Cheng, Zhen Wan, Jinchuan Tian, Yuming Lou, Dong Yang, Zhijian Liu, Yukang Chen, Ambrish Dantrey, Ehsan Jahangiri, Sreyan Ghosh, Daguang Xu, Ehsan Hosseini-Asl, Danial Mohseni Taheri, Vidya Murali, Sifei Liu, Jason Lu, Oluwatobi Olabiyi, Frank Wang, Rafael Valle, Bryan Catanzaro, Andrew Tao, Song Han, Jan Kautz, Hongxu Yin, Pavlo Molchanov 10/17/2025 arxiv

computer vision

Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curati...

View Paper

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

0

5.0/10

Jie-Ying Lee, Yi-Ruei Liu, Shr-Ruei Tsai, Wei-Cheng Chang, Chung-Ho Wu, Jiewen Chan, Zhenjun Zhao, Chieh Hubert Lin, Yu-Lun Liu 10/17/2025 arxiv

computer vision

Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack of large-scale and high-quality real-world 3D scans for training generalizable generative models. In th...

Keywords: diffusion model

View Paper

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

0

5.0/10

Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee, Chih-Hai Su, Yu-Lun Liu 10/17/2025 arxiv

computer vision

Lens flare significantly degrades image quality, impacting critical computer vision tasks like object detection and autonomous driving. Recent Single Image Flare Removal (SIFR) methods perform poorly when off-frame light sources are incomplete or absent. We propose LightsOut, a diffusion-based outpa...

Keywords: diffusion model, detection, regression

View Paper

BiomedXPro: Prompt Optimization for Explainable Diagnosis with Biomedical Vision Language Models

0

5.0/10

Kaushitha Silva, Mansitha Eashwara, Sanduni Ubayasiri, Ruwan Tennakoon, Damayanthi Herath 10/17/2025 arxiv

computer vision

The clinical adoption of biomedical vision-language models is hindered by prompt optimization techniques that produce either uninterpretable latent vectors or single textual prompts. This lack of transparency and failure to capture the multi-faceted nature of clinical diagnosis, which relies on inte...

View Paper

PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction

0

5.0/10

Simon Yu, Gang Li, Weiyan Shi, Peng Qi 10/17/2025 arxiv

natural language processing

Large language models (LLMs) are moving beyond static uses and are now powering agents that learn continually during their interaction with external environments. For example, agents can learn reusable skills while navigating web pages or toggling new tools. However, existing methods for skill learn...

View Paper

PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

0

5.0/10

Yi Wan, Jiuqi Wang, Liam Li, Jinsong Liu, Ruihao Zhu, Zheqing Zhu 10/17/2025 arxiv

natural language processing

Tool-augmented large language models (LLMs) are emerging as deep research agents, systems that decompose complex queries, retrieve external evidence, and synthesize grounded responses. Yet current agents remain limited by shallow retrieval, weak alignment metrics, and brittle tool-use behavior. We i...

Keywords: reinforcement learning

View Paper

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

0

5.0/10

Pengkai Wang, Qi Zuo, Pengwei Liu, Zhijie Sang, Congkai Xie, Hongxia Yang 10/17/2025 arxiv

natural language processing

Large Language Models (LLMs) have shown substantial advances through reinforcement learning (RL), particularly in domains where rewards can be programmatically verified, such as mathematics and code. In these areas, models benefit from a well-defined operational base guided by explicit rule-based ob...

Keywords: reinforcement learning

View Paper

BLIP3o-NEXT: Next Frontier of Native Image Generation

0

5.0/10

Jiuhai Chen, Le Xue, Zhiyang Xu, Xichen Pan, Shusheng Yang, Can Qin, An Yan, Honglu Zhou, Zeyuan Chen, Lifu Huang, Tianyi Zhou, Junnan Li, Silvio Savarese, Caiming Xiong, Ran Xu 10/17/2025 arxiv

computer vision

We present BLIP3o-NEXT, a fully open-source foundation model in the BLIP3 series that advances the next frontier of native image generation. BLIP3o-NEXT unifies text-to-image generation and image editing within a single architecture, demonstrating strong image generation and image editing capabiliti...

Keywords: diffusion model, reinforcement learning

View Paper

SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling

0

5.0/10

Kadri Hacioglu, Manjunath K E, Andreas Stolcke 10/17/2025 arxiv

natural language processing

Slot filling is a crucial subtask in spoken language understanding (SLU), traditionally implemented as a cascade of speech recognition followed by one or more natural language understanding (NLU) components. The recent advent of speech-based large language models (speechLLMs), which integrate speech...

View Paper

Self-Certifying Primal-Dual Optimization Proxies for Large-Scale Batch Economic Dispatch

0

5.0/10

Michael Klamkin, Mathieu Tanneau, Pascal Van Hentenryck 10/17/2025 arxiv

machine learning

Recent research has shown that optimization proxies can be trained to high fidelity, achieving average optimality gaps under 1% for large-scale problems. However, worst-case analyses show that there exist in-distribution queries that result in orders of magnitude higher optimality gap, making it dif...

View Paper

Export Archive Data

Browse by Date

Papers for October 20, 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

BiomedXPro: Prompt Optimization for Explainable Diagnosis with Biomedical Vision Language Models

PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction

PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

BLIP3o-NEXT: Next Frontier of Native Image Generation

SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling

Self-Certifying Primal-Dual Optimization Proxies for Large-Scale Batch Economic Dispatch