Paper Archive

TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration

0

9.0/10

Cheng Xin, Fan Xu, Xin Ding, Jie Gao, Jiaxin Ding 10/6/2025 arxiv

machine learning

Graph Neural Networks (GNNs) have shown remarkable success across various scientific fields, yet their adoption in critical decision-making is often hindered by a lack of interpretability. Recently, intrinsically interpretable GNNs have been studied to provide insights into model predictions by iden...

Keywords: persistent homology, graph neural networks, interpretability, rationale filtration, topological discrepancy, explainable AI, autoregressive rationale generation

View Paper

From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models

0

9.0/10

Mingkang Zhu, Xi Chen, Bei Yu, Hengshuang Zhao, Jiaya Jia 10/6/2025 arxiv

machine learning

Large reasoning models (LRMs) generate intermediate reasoning traces before producing final answers, yielding strong gains on multi-step and mathematical tasks. Yet aligning LRMs with human preferences, a crucial prerequisite for model deployment, remains underexplored. The statistically correct obj...

Keywords: BVPO, preference optimization, bias-variance trade-off, gradient variance, reasoning traces, alignment, large reasoning models

View Paper

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

0

9.0/10

Ziqi Huang, Ning Yu, Gordon Chen, Haonan Qiu, Paul Debevec, Ziwei Liu 10/6/2025 arxiv

computer vision

Recent video generation models can produce smooth and visually appealing clips, but they often struggle to synthesize complex dynamics with a coherent chain of consequences. Accurately modeling visual outcomes and state transitions over time remains a core challenge. In contrast, large language and ...

Keywords: VChain, chain-of-visual-thought, video generation, multimodal models, keyframes, inference-time tuning, sparse tuning, visual reasoning

View Paper

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

0

9.0/10

Le Zhuo, Songhao Han, Yuandong Pu, Boxiang Qiu, Sayak Paul, Yue Liao, Yihao Liu, Jie Shao, Xi Chen, Si Liu, Hongsheng Li 10/6/2025 arxiv

computer vision

While modern visual generation models excel at creating aesthetically pleasing natural images, they struggle with producing or editing structured visuals like charts, diagrams, and mathematical figures, which demand composition planning, text rendering, and multimodal reasoning for factual fidelity....

Keywords: structured visuals, dataset, 1.3M image pairs, chain-of-thought, VLM, FLUX.1 Kontext, three-stage training, external reasoner

View Paper

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

0

9.0/10

Runchu Tian, Junxia Cui, Xueqiang Xu, Feng Yao, Jingbo Shang 10/6/2025 arxiv

natural language processing

Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering advantages such as accelerated parallel decoding and bidirectional context modeling. However, the vanilla decoding strategy in discrete dLLMs suffers from a critical limit...

Keywords: diffusion LLM, Tolerator, token-level cross-validation, decoding algorithm, remasking, sequence fill-up, diffusion models, code generation

View Paper

TeachLM: Post-Training LLMs for Education Using Authentic Learning Data

0

9.0/10

Janos Perczel, Jin Chow, Dorottya Demszky 10/6/2025 arxiv

machine learning

The promise of generative AI to revolutionize education is constrained by the pedagogical limits of large language models (LLMs). A major issue is the lack of access to high-quality training data that reflect the learning of actual students. Prompt engineering has emerged as a stopgap, but the abili...

Keywords: LLM, education, fine-tuning, parameter-efficient fine-tuning, synthetic student model, dialogue evaluation, Polygence dataset, student-tutor interactions

View Paper

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

0

9.0/10

Ronen Kamenetsky, Sara Dorfman, Daniel Garibi, Roni Paiss, Or Patashnik, Daniel Cohen-Or 10/6/2025 arxiv

computer vision

Large-scale text-to-image diffusion models have become the backbone of modern image editing, yet text prompts alone do not offer adequate control over the editing process. Two properties are especially desirable: disentanglement, where changing one attribute does not unintentionally alter others, an...

Keywords: Sparse Autoencoder, token-level control, text embeddings, disentangled editing, continuous control, diffusion models, model-agnostic

View Paper

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

0

9.0/10

Siheng Zhao, Yanjie Ze, Yue Wang, C. Karen Liu, Pieter Abbeel, Guanya Shi, Rocky Duan 10/6/2025 arxiv

robotics

Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco...

Keywords: ResMimic, residual learning, humanoid, loco-manipulation, general motion tracking, point-cloud reward, contact reward, curriculum learning

View Paper

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

0

9.0/10

Unknown authors 10/7/2025 huggingface

computer vision

Video understanding represents the most challenging frontier in computer vision, requiring models to reason about complex spatiotemporal relationships, long-term dependencies, and multimodal evidence. The recent emergence of Video-Large Multimodal Models (Video-LMMs), which integrate visual encoders...

Keywords: Video-LMMs, supervised fine-tuning, reinforcement learning, test-time scaling, temporal localization, spatiotemporal grounding, long video efficiency, multimodal evidence integration

View Paper

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

0

9.0/10

Unknown authors 10/7/2025 huggingface

natural language processing

ArXiv: https://arxiv.org/pdf/2510.04800.Code and detailed results will be released later.\n","updatedAt":"2025-10-07T02:43:05.775Z","author":{"_id":"6602ca1e10a1441af41637be","avatarUrl":"/avatars/5880e699def320beb352cbed77495b2f....

Keywords: self-attention mechanisms, structured state space models, Mamba, hybrid architectures, inter-layer fusion, intra-layer fusion, long-context capabilities, scaling analysis

View Paper

Export Archive Data

Browse by Date

Papers for October 7, 2025

TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration

From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

TeachLM: Post-Training LLMs for Education Using Authentic Learning Data

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights