基于流动的政策与分布式强化学习 轨迹优化
Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization
作者
Authors
暂无作者信息
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
-
DOI
http://arxiv.org/abs/2604.00977v1
📝 摘要
Abstract
Reinforcement Learning (RL) has proven highly effective in addressing complex control and decision-making tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution, which constrains the policy from capturing multimodal distributions, making it difficult to cover the full range of optimal solutions in multi-solution problems, and the return is reduced to a mean value, losing its multimodal nature and thus providing insufficient guidance for policy updates. In response to these problems, we propose a RL algorithm termed flow-based policy with distributional RL (FP-DRL). This algorithm models the policy using flow matching, which offers both computational efficiency and the capacity to fit complex distributions. Additionally, it employs a distributional RL approach to model and optimize the entire return distribution, thereby more effectively guiding multimodal policy updates and improving agent performance. Experimental trails on MuJoCo benchmarks demonstrate that the FP-DRL algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting superior representation capability of the flow policy.
📊 文章统计
Article Statistics
基础数据
Basic Stats
8
浏览
Views
0
下载
Downloads
0
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views