Paper Archive

Browse and export your curated research paper collection

79
Archived Days
788
Total Papers
8.4
Avg Score
9
Categories

Export Archive Data

Download your archived papers in various formats

JSON: Complete data with analysis • CSV: Tabular data for analysis • Markdown: Human-readable reports • BibTeX: Academic citations
Browse by Date

Papers for November 29, 2025

10 papers found

Yusuf Dalva, Guocheng Gordon Qian, Maya Goldenberg, Tsai-Shien Chen, Kfir Aberman, Sergey Tulyakov, Pinar Yanardag, Kuan-Chieh Jackson Wang 11/26/2025 arxiv

computer vision

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references, spatial arrangements, pose constraints, and layout annotati...

Keywords: diffusion models, multimodal controls, canvas representation, compositional generation, multi-task training, layout control, pose control, identity preservation

Seungjae Lee, Yoonkyo Jung, Inkook Chun, Yao-Chih Lee, Zikui Cai, Hongjia Huang, Aayush Talreja, Tan Dat Dao, Yongyuan Liang, Jia-Bin Huang, Furong Huang 11/26/2025 arxiv

robotics

Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are abundant, differences in embodiment, camera, and environment hinder their direct use. We address the small-data...

Keywords: trace-space, world model, robot learning, cross-embodiment, TraceGen, TraceForge, 3D motion prior, few-shot adaptation

Hongjin Su, Shizhe Diao, Ximing Lu, Mingjie Liu, Jiacheng Xu, Xin Dong, Yonggan Fu, Peter Belcak, Hanrong Ye, Hongxu Yin, Yi Dong, Evelina Bakhturina, Tao Yu, Yejin Choi, Jan Kautz, Pavlo Molchanov 11/26/2025 arxiv

machine learning

Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u...

Keywords: ToolOrchestra, orchestrator, tool orchestration, reinforcement learning, efficiency, model composition, Orchestrator-8B, HLE benchmark

Wenbo Hu, Jingli Lin, Yilin Long, Yunlong Ran, Lihan Jiang, Yifan Wang, Chenming Zhu, Runsen Xu, Tai Wang, Jiangmiao Pang 11/26/2025 arxiv

machine learning

Vision-Language Models (VLMs) still lack robustness in spatial intelligence, demonstrating poor performance on spatial understanding and reasoning tasks. We attribute this gap to the absence of a visual geometry learning process capable of reconstructing 3D space from 2D images. We present G$^2$VLM,...

Keywords: G2VLM, geometry grounded, vision-language model, 3D reconstruction, spatial reasoning, multi-view, in-context learning, interleaved reasoning

Dong Wang, Yang Li, Ansong Ni, Ching-Feng Yeh, Youssef Emad, Xinjie Lei, Liam Robbins, Karthik Padthe, Hu Xu, Xian Li, Asli Celikyilmaz, Ramya Raghavendra, Lifei Huang, Carole-Jean Wu, Shang-Wen Li 11/26/2025 arxiv

machine learning

Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality...

Keywords: synthetic data, multi-agent, peer-to-peer, decentralized, distributed queues, Ray, LLM inference, data generation throughput

Zihui Xue, Kristen Grauman, Dima Damen, Andrew Zisserman, Tengda Han 11/26/2025 arxiv

computer vision

Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to systematically investigate this seemingly implausible question. Towards this end, we propose a contrastive learning framework to train CamFormer,...

Keywords: camera trajectory, CamFormer, contrastive learning, trajectory embedding, cross-modal alignment, egocentric, exocentric, pose estimation

Weihao Bo, Shan Zhang, Yanpeng Sun, Jingjing Wu, Qunyi Xie, Xiao Tan, Kunbin Chen, Wei He, Xiaofan Li, Na Zhao, Jingdong Wang, Zechao Li 11/26/2025 arxiv

multimodal learning

MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for reuse. However, trajectory-based memory suffers from brevity bias, gradually l...

Keywords: ViLoMem, multimodal semantic memory, dual-stream memory, grow-and-refine, MLLM, lifelong learning, distraction-hallucination separation, pass@1

Sadegh Shirani, Mohsen Bayati 11/26/2025 arxiv

machine learning

Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or social systems, the interaction pathways driving these interference structures remain largely unobserved. We argue...

Keywords: causal_effects, interference, exposure_mapping, evolution_based, difference_in_differences, causal_message_passing, influencer_networks, treatment_randomization

Lorenzo Shaikewitz, Charis Georgiou, Luca Carlone 11/26/2025 arxiv

robotics

Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics problem, attaching statistically rigorous uncertainty is not well understood without strict distributional assumptions. We develop distribution-f...

Keywords: pose estimation, uncertainty quantification, S-lemma, sum-of-squares, ellipsoid, monocular, semantic keypoints, convex relaxation

Naifu Zhang, Wei Tao, Xi Xiao, Qianpu Sun, Yuxin Zheng, Wentao Mo, Peiqiang Wang, Nan Zhang 11/26/2025 arxiv

machine learning

In recent years, Vision-Language-Action (VLA) models in embodied intelligence have developed rapidly. However, existing adversarial attack methods require costly end-to-end training and often generate noticeable perturbation patches. To address these limitations, we propose ADVLA, a framework that d...

Keywords: ADVLA, Vision-Language-Action, adversarial attacks, attention guidance, sparse perturbations, feature-space attacks, Top-K masking, L_inf
Loading...

Preparing your export...