{"papers":[{"id":"arxiv_2606.12385v1","arxiv_id":"2606.12385v1","title":"Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs","abstract":"Modern LLM training pipelines increasingly rely on other models to generate data, filter corpora, judge outputs, and guide development decisions. These dependencies are recursive: a model may depend on an upstream artifact whose own dependencies are documented only in separate releases and artifacts. As a result, the full dependency structure is fragmented across heterogeneous public artifacts, with complexity and recursive depth far outpacing humans' ability to trace. We introduce ModSleuth, an agentic system that recursively reconstructs LLM dependency graphs from public artifacts with source-grounded evidence. We find that the primary challenge is no longer information extraction, but defining what constitutes a dependency and reconciling artifact references across inconsistent documentation. We address these challenges through a formalization that distinguishes direct and indirect dependencies, represents heterogeneous pipeline roles through operation-centered relationships, and resolves artifact identities across names, versions, and repositories. Applying ModSleuth to four public-artifact-rich LLM releases, we recover 1,060 source-verified dependencies and construct large-scale dependency graphs of modern LLM development. These graphs reveal multi-hop license obligations, train-evaluation coupling, discrepancies between released and training-time artifacts, and documentation inconsistencies that would otherwise be difficult to uncover. We release ModSleuth and the resulting dependency graphs to support transparent analysis of the increasingly complex ecosystems underlying modern LLMs.","authors":["Sanjay Adhikesaven","Haoxiang Sun","Sewon Min"],"published":"2026-06-10T17:47:59Z","updated":"2026-06-10T17:47:59Z","category":"","source":"arxiv","original_source":"arxiv","url":"http://arxiv.org/abs/2606.12385v1","pdf_url":"http://arxiv.org/pdf/2606.12385v1.pdf","scraped_at":"2026-06-11T08:00:09.117Z","images":["https://arxiv.org/html/2606.12385v1/x1.png"],"analysis":{"introduction":"🚀 Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Modern LLM training pipelines increasingly rely on other models to generate data, filter corpora, judge outputs, and guide development decisions. These dependencies are recursive: a model may depend on an upstream artifact whose own dependencies are documented only in separate releases and artifacts...","keywords":[],"category":"reinforcement_learning","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:01:12.822Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_2606.11894_tv2luwoz3","title":"Wild3R: Feed-Forward 3D Gaussian Splatting from Unconstrained Sparse Photo Collection","abstract":"Feed-forward 3D Gaussian Splatting (3DGS) removes the need for time-consuming per-scene optimization required by traditional 3DGS. However, existing feed-forward approaches struggle with real-world photo collections that include diverse lighting conditions and transient objects. In this paper, we present Wild3R, a feed-forward approach for unconstrained sparse photo collections. The main bottleneck is the lack of training data that provides multiple viewpoints, a variety of illuminations, and transient variations necessary for learning robust scene representations. To address this, we introduce the WildCity dataset, which comprises 200 scenes, 170 lighting conditions, and transient objects, resulting in 337,500 images in total. By leveraging the dataset, our model learns appearance consistency across viewpoints conditioned on reference views, while removing transient content. Extensive experiments demonstrate that our method outperforms existing feed-forward approaches and achieves results competitive with prior per-scene optimization-based methods.","authors":[{"_id":"6a2a23fc80a9c7c6830c0f31","name":"Yuto Furutani","hidden":false},{"_id":"6a2a23fc80a9c7c6830c0f32","name":"Takashi Otonari","hidden":false},{"_id":"6a2a23fc80a9c7c6830c0f33","name":"Kaede Shiohara","hidden":false},{"_id":"6a2a23fc80a9c7c6830c0f34","name":"Toshihiko Yamasaki","hidden":false}],"published":"2026-06-10T00:00:00.000Z","updated":"2026-06-11T08:00:08.323Z","category":"computer_vision","source":"huggingface","original_source":"huggingface_api","url":"https://huggingface.co/papers/2606.11894","pdf_url":"","scraped_at":"2026-06-11T08:00:08.323Z","images":["https://arxiv.org/html/2606.11894/x1.png"],"analysis":{"introduction":"🚀 Wild3R: Feed-Forward 3D Gaussian Splatting from Unconstrained Sparse Photo Collection - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Feed-forward 3D Gaussian Splatting (3DGS) removes the need for time-consuming per-scene optimization required by traditional 3DGS. However, existing feed-forward approaches struggle with real-world photo collections that include diverse lighting conditions and transient objects. In this paper, we pr...","keywords":[],"category":"computer_vision","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:02:16.609Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"arxiv_2606.12402v1","arxiv_id":"2606.12402v1","title":"DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?","abstract":"Vision-Language Models (VLMs) are increasingly deployed as high-level planners for embodied agents, with an emerging strategy of scaling test-time compute to improve capability. However, we observe that doing so increases latency, token usage, and FLOPs while yielding uneven, often diminishing gains in downstream success, limiting where embodied agents can be deployed. We argue that choosing when and where to spend test-time compute is central to bringing frontier performance to the real world. We introduce DIRECT, a routing framework that uses multimodal scene context to allocate compute per prompt, improving the success--cost Pareto frontier over fixed model selection. Across three dominant scaling axes, namely chain-of-thought depth, model size, and memory history, our experiments on VLABench and RoboMME show that test-time compute is not a uniform lever: different axes yield qualitatively distinct capability gains. We validate these insights on a physical Franka arm in a DROID setup spanning zero-shot manipulation and long-horizon chaining, where our router matches or exceeds a stronger model's success rate at up to 65% lower average latency. Ultimately, our results show that naively scaling test-time compute is wasteful, and that DIRECT can provide frontier-level embodied planning in robotic systems at a fraction of the cost. Project page can be found at jadee-dao.github.io/direct/.","authors":["Jadelynn Dao","Milan Ganai","Yasmina Abukhadra","Ajay Sridhar","Mozhgan Nasr Azadani","Katie Luo","Clark Barrett","Jiajun Wu","Chelsea Finn","Marco Pavone"],"published":"2026-06-10T17:58:49Z","updated":"2026-06-10T17:58:49Z","category":"","source":"arxiv","original_source":"arxiv","url":"http://arxiv.org/abs/2606.12402v1","pdf_url":"http://arxiv.org/pdf/2606.12402v1.pdf","scraped_at":"2026-06-11T08:00:09.117Z","images":["https://arxiv.org/html/2606.12402v1/x1.png"],"analysis":{"introduction":"🚀 DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners? - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Vision-Language Models (VLMs) are increasingly deployed as high-level planners for embodied agents, with an emerging strategy of scaling test-time compute to improve capability. However, we observe that doing so increases latency, token usage, and FLOPs while yielding uneven, often diminishing gains...","keywords":[],"category":"computer_vision","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:00:51.100Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"arxiv_2606.12396v1","arxiv_id":"2606.12396v1","title":"VLGA: Vision-Language-Geometry-Action Models for Autonomous Driving","abstract":"Vision-language-action (VLA) models can describe scenes and reason about them in language, yet still struggle to ground their actions in the dense 3D world around them. Existing approaches either inject features from a frozen 3D foundation model without an objective that ensures the policy uses them, or constrain geometry with sparse box and map losses that provide no dense spatial signal. We introduce VLGA, the first vision-language-action model supervised to reconstruct the dense 3D world it drives through. VLGA introduces geometry as a fourth modality alongside vision, language, and action through a dedicated expert supervised by a per-pixel pointmap regression loss against LiDAR. Extensive experiments conducted on challenging nuScenes and Bench2Drive datasets for open-loop and closed-loop evaluations, respectively, show the superiority of VLGA over counterpart VLA methods. In particular, on open-loop nuScenes, VLGA sets a new state of the art among VLA methods without ego status, with the lowest L2 (0.50\\,m average) and 3-second collision rate (0.18\\%). On closed-loop Bench2Drive, VLGA attains the state-of-the-art driving score of 79.08, +0.71 over the strongest prior VLA, at comparable efficiency and comfort.","authors":["Jin Yao","Dhruva Dixith Kurra","Tom Lampo","Zezhou Cheng","Danhua Guo","Burhan Yaman"],"published":"2026-06-10T17:57:06Z","updated":"2026-06-10T17:57:06Z","category":"","source":"arxiv","original_source":"arxiv","url":"http://arxiv.org/abs/2606.12396v1","pdf_url":"http://arxiv.org/pdf/2606.12396v1.pdf","scraped_at":"2026-06-11T08:00:09.117Z","images":["https://arxiv.org/html/2606.12396v1/x1.png"],"analysis":{"introduction":"🚀 VLGA: Vision-Language-Geometry-Action Models for Autonomous Driving - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Vision-language-action (VLA) models can describe scenes and reason about them in language, yet still struggle to ground their actions in the dense 3D world around them. Existing approaches either inject features from a frozen 3D foundation model without an objective that ensures the policy uses them...","keywords":["regression"],"category":"computer_vision","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:00:51.096Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_daily_2606.07514_j424ty","arxiv_id":"2606.07514","title":"UniSHARP: Universal Sharp Monocular View Synthesis","abstract":"In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera systems, from conventional perspective cameras to wide-field-of-view, fisheye and omnidirectional panoramic settings. To overcome the pinhole-specific assumptions of SHARP, our key idea is to align various images in a unified omnidirectional latent space. Thus, we propose UniSHARP, which performs implicit alignment in both feature and Gaussian spaces. Specifically, Gaussian primitives are arranged along rays and radial distances in a ray-based universal representation, while 2D semantic and 3D spatial features extracted from UniK3D-inspired encoders are jointly decoded to generate the complete Gaussian cloud. To comprehensively evaluate our method, we construct a benchmark covering diverse imaging systems across various scenes. The benchmark is further stratified by field of view (FoV) to enable fine-grained assessment of the universal monocular rendering task. Extensive experiments on the proposed benchmark demonstrate the effectiveness of UniSHARP, outperforming alternative methods by a large margin. The project page can be found at: https://insta360-research-team.github.io/Unisharp-website/","authors":["Meixi Song","Dizhe Zhang","Hao Ren","Ruiyang Zhang","Bo Du","Ming-Hsuan Yang","Lu Qi"],"published":"2026-06-05T00:00:00.000Z","updated":"2026-06-11T08:00:08.383Z","category":"computer_vision","source":"huggingface","original_source":"huggingface_daily_papers","url":"https://huggingface.co/papers/2606.07514","pdf_url":"https://arxiv.org/pdf/2606.07514","scraped_at":"2026-06-11T08:00:08.383Z","images":["https://arxiv.org/html/2606.07514/x1.png"],"upvotes":0,"ai_summary":"","ai_keywords":[],"github_repo":"","github_stars":0,"analysis":{"introduction":"🚀 UniSHARP: Universal Sharp Monocular View Synthesis - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera systems, from conventional perspective cameras to wide-field-of-view, fisheye and omnidirectional panoramic settings. To overcome the pinhole-sp...","keywords":[],"category":"computer_vision","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:07:50.492Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_daily_2512.10971_qo564y","arxiv_id":"2512.10971","title":"AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets","abstract":"Large Language Models (LLMs) have demonstrated remarkable potential as autonomous agents, approaching human-expert performance through advanced reasoning and tool orchestration. However, decision-making in fully dynamic and live environments remains highly challenging, requiring real-time information integration and adaptive responses. While existing efforts have explored live evaluation mechanisms in structured tasks, a critical gap remains in systematic benchmarking for real-world applications, particularly in finance where stringent requirements exist for live strategic responsiveness. To address this gap, we introduce AI-Trader, the first fully-automated, live, and data-uncontaminated evaluation benchmark for LLM agents in financial decision-making. AI-Trader spans three major financial markets: U.S. stocks, A-shares, and cryptocurrencies, with multiple trading granularities to simulate live financial environments. Our benchmark implements a revolutionary fully autonomous minimal information paradigm where agents receive only essential context and must independently search, verify, and synthesize live market information without human intervention. We evaluate six mainstream LLMs across three markets and multiple trading frequencies. Our analysis reveals striking findings: general intelligence does not automatically translate to effective trading capability, with most agents exhibiting poor returns and weak risk management. We demonstrate that risk control capability determines cross-market robustness, and that AI trading strategies achieve excess returns more readily in highly liquid markets than policy-driven environments. These findings expose critical limitations in current autonomous agents and provide clear directions for future improvements. The code and evaluation data are open-sourced to foster community research: https://github.com/HKUDS/AI-Trader.","authors":["Tianyu Fan","Yuhao Yang","Yangqin Jiang","Yifei Zhang","Yuxuan Chen","Chao Huang"],"published":"2025-12-01T04:25:36.000Z","updated":"2026-06-11T08:00:08.383Z","category":"natural_language_processing","source":"huggingface","original_source":"huggingface_daily_papers","url":"https://huggingface.co/papers/2512.10971","pdf_url":"https://arxiv.org/pdf/2512.10971","scraped_at":"2026-06-11T08:00:08.383Z","images":["https://arxiv.org/html/2512.10971/x1.png"],"upvotes":0,"ai_summary":"","ai_keywords":[],"github_repo":"","github_stars":0,"analysis":{"introduction":"🚀 AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Large Language Models (LLMs) have demonstrated remarkable potential as autonomous agents, approaching human-expert performance through advanced reasoning and tool orchestration. However, decision-making in fully dynamic and live environments remains highly challenging, requiring real-time informatio...","keywords":[],"category":"natural_language_processing","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:07:05.621Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_daily_2508.19205_xzf6ww","arxiv_id":"2508.19205","title":"VibeVoice Technical Report","abstract":"This report presents VibeVoice, a novel model designed to synthesize\nlong-form speech with multiple speakers by employing next-token diffusion,\nwhich is a unified method for modeling continuous data by autoregressively\ngenerating latent vectors via diffusion. To enable this, we introduce a novel\ncontinuous speech tokenizer that, when compared to the popular Encodec model,\nimproves data compression by 80 times while maintaining comparable performance.\nThe tokenizer effectively preserves audio fidelity while significantly boosting\ncomputational efficiency for processing long sequences. Thus, VibeVoice can\nsynthesize long-form speech for up to 90 minutes (in a 64K context window\nlength) with a maximum of 4 speakers, capturing the authentic conversational\n``vibe'' and surpassing open-source and proprietary dialogue models.","authors":["Zhiliang Peng","Jianwei Yu","Wenhui Wang","Yaoyao Chang","Yutao Sun","Li Dong","Yi Zhu","Weijiang Xu","Hangbo Bao","Zehua Wang","Shaohan Huang","Yan Xia","Furu Wei"],"published":"2025-08-26T17:09:12.000Z","updated":"2026-06-11T08:00:08.383Z","category":"natural_language_processing","source":"huggingface","original_source":"huggingface_daily_papers","url":"https://huggingface.co/papers/2508.19205","pdf_url":"https://arxiv.org/pdf/2508.19205","scraped_at":"2026-06-11T08:00:08.383Z","images":["https://arxiv.org/html/2508.19205/x1.png"],"upvotes":0,"ai_summary":"","ai_keywords":[],"github_repo":"","github_stars":0,"analysis":{"introduction":"🚀 VibeVoice Technical Report - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: This report presents VibeVoice, a novel model designed to synthesize\nlong-form speech with multiple speakers by employing next-token diffusion,\nwhich is a unified method for modeling continuous data by autoregressively\ngenerating latent vectors via diffusion. To enable this, we introduce a novel\ncon...","keywords":[],"category":"natural_language_processing","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:05:13.303Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_2606.11075_z1779jiuc","title":"Exploring the Design Space of Reward Backpropagation for Flow Matching","abstract":"Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps inflate the reward gradient as it travels back to early indices. Connector-based methods, such as LeapAlign, address these issues by replacing the full backward trajectory with a short pinned path, highlighting a useful decoupling between sampling and optimization. However, the quality of the resulting gradient depends on how accurately this short path approximates the full rollout, especially over long intervals. We propose FlowBP, a unified surrogate-trajectory framework that treats the backward trajectory itself as the design object. FlowBP keeps a no-gradient cached rollout for sampling, then builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. This view separates four choices: the reward-model input, active set, integration weights, and bridge coupling, and recovers prior direct-gradient methods as particular settings. Within this framework, we instantiate three variants: FlowBP-Sparse uses sparse Euler reconstruction, FlowBP-Bridge adds controlled bridge coupling, and FlowBP-Lagrange raises the order of leap quadrature. All three bound memory by the active-set size and limit gradient chaining to at most one Jacobian factor. Across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base on preference, quality, and compositional metrics, the three variants improve over direct-gradient baselines on most metrics.","authors":[{"_id":"6a2913b7e7d78ea7587e5681","name":"Ruoyu Wang","hidden":false},{"_id":"6a2913b7e7d78ea7587e5682","name":"Boye Niu","hidden":false},{"_id":"6a2913b7e7d78ea7587e5683","name":"Xiangxin Zhou","hidden":false},{"_id":"6a2913b7e7d78ea7587e5684","name":"Yushi Huang","hidden":false},{"_id":"6a2913b7e7d78ea7587e5685","name":"Tongliang Liu","hidden":false},{"_id":"6a2913b7e7d78ea7587e5686","name":"Chi Zhang","hidden":false}],"published":"2026-06-09T16:36:54.000Z","updated":"2026-06-11T08:00:08.323Z","category":"computer_vision","source":"huggingface","original_source":"huggingface_api","url":"https://huggingface.co/papers/2606.11075","pdf_url":"","scraped_at":"2026-06-11T08:00:08.323Z","images":["https://arxiv.org/html/2606.11075/x1.png"],"analysis":{"introduction":"🚀 Exploring the Design Space of Reward Backpropagation for Flow Matching - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps infl...","keywords":["backpropagation"],"category":"computer_vision","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:03:21.947Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_2606.11087_m3g2xgou6","title":"Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning","abstract":"Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and real robot control. While they are known to scale stably in the supervised imitation learning setting, incorporating them into reinforcement learning (RL) pipelines for policy improvement has proven more difficult. It often requires specialized training objectives or backpropagating through denoising processes, which cause well-known issues with stability and affect scalability. In this paper we study the question of whether simple policy improvement schemes at test time alone, leaving stable supervised policy training intact, can be a competitive alternative which sidesteps these issues. To this end, we propose QGF (Q-Guided Flow), an RL algorithm that performs policy optimization entirely at test time. QGF works by pre-training both a reference flow policy (via a standard behavioral cloning objective) and a value function critic and, at test time, using the value gradient to guide the reference policy to generate higher-value actions without any additional policy learning. Empirically, QGF outperforms prior test-time RL methods on single-task and goal-conditioned offline RL benchmarks with high-dimensional action spaces, and is competitive with state-of-the-art training-time algorithms while being much cheaper to run. Moreover, it exhibits favorable scaling with model size by avoiding the instability of actor-critic training, offering a practical and effective alternative RL algorithm with expressive policies.","authors":[{"_id":"6a28c6e0e7d78ea7587e534c","name":"Zhiyuan Zhou","hidden":false},{"_id":"6a28c6e0e7d78ea7587e534d","name":"Andy Peng","hidden":false},{"_id":"6a28c6e0e7d78ea7587e534e","name":"Charles Xu","hidden":false},{"_id":"6a28c6e0e7d78ea7587e534f","name":"Qiyang Li","hidden":false},{"_id":"6a28c6e0e7d78ea7587e5350","name":"Tobias Springenberg","hidden":false},{"_id":"6a28c6e0e7d78ea7587e5351","name":"Kevin Frans","hidden":false},{"_id":"6a28c6e0e7d78ea7587e5352","name":"Sergey Levine","hidden":false}],"published":"2026-06-09T00:00:00.000Z","updated":"2026-06-11T08:00:08.323Z","category":"reinforcement_learning","source":"huggingface","original_source":"huggingface_api","url":"https://huggingface.co/papers/2606.11087","pdf_url":"","scraped_at":"2026-06-11T08:00:08.323Z","images":["https://arxiv.org/html/2606.11087/x1.png"],"analysis":{"introduction":"🚀 Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and real robot control. While they are known to scale stably in the supervised imitation learning setting, incorporating them into reinforcement...","keywords":["reinforcement learning"],"category":"reinforcement_learning","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:04:28.756Z","model":"fallback","error":true,"title_only_analysis":false},"views":0},{"id":"hf_2606.10646_ukoy10afe","title":"How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs","abstract":"Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal signals to assign finer-grained credit, but these are often point-wise heuristics that ignore the global structure of information propagation. We propose FlowTracer, an RL framework that traces answer-targeted reasoning flow on an attention-induced directed acyclic graph in which nodes correspond to tokens and edge capacities come from aggregated attention weights and derives token credit from this global structure. The edge capacities are reweighted to retain only the influence that can reach the answer region, while enforcing local flow conservation so intermediate tokens neither lose nor gain effective mass due to path length or irrelevant branches. On this graph, FlowTracer extracts an information-flow backbone connecting the question to the answer and scores tokens by flow throughput, revealing high-impact hubs and aggregation checkpoints that mediate long-range dependencies. These derived importances are used to shape token-level rewards, enabling learning signals to focus precisely on the tokens that route information toward (or away from) correct answers and delivering consistent performance gains across a range of reasoning tasks.","authors":[{"_id":"6a28ce5ae7d78ea7587e53e8","name":"Zhichen Dong","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53e9","name":"Yang Li","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53ea","name":"Yuhan Sun","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53eb","name":"Weixun Wang","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53ec","name":"Yijia Luo","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53ed","name":"Zinian Peng","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53ee","name":"Taiheng Ye","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53ef","name":"Chao Yang","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53f0","name":"Wenbo Su","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53f1","name":"Yu Cheng","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53f2","name":"Bo Zheng","hidden":false},{"_id":"6a28ce5ae7d78ea7587e53f3","name":"Junchi Yan","hidden":false}],"published":"2026-06-09T09:56:51.000Z","updated":"2026-06-11T08:00:08.323Z","category":"natural_language_processing","source":"huggingface","original_source":"huggingface_api","url":"https://huggingface.co/papers/2606.10646","pdf_url":"","scraped_at":"2026-06-11T08:00:08.323Z","images":["https://arxiv.org/html/2606.10646/x1.png"],"analysis":{"introduction":"🚀 How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs - Research analysis - Analysis temporarily unavailable due to processing error.","challenges":"🎯 Challenges information unavailable due to processing error.","innovations":"✨ Innovation details unavailable due to processing error.","experiments":"📊 Experimental results unavailable due to processing error.","insights":"🤔 Research insights unavailable due to processing error.","summary":"Abstract: Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal...","keywords":["attention","reinforcement learning"],"category":"natural_language_processing","relevance_score":5,"technical_depth":"unknown","analyzed_at":"2026-06-11T08:03:21.950Z","model":"fallback","error":true,"title_only_analysis":false},"views":0}],"pagination":{"current_page":1,"total_pages":11,"total_papers":109,"has_next":true,"has_prev":false},"filters":{"category":null,"search":null,"date":null}}