Tight Sample Complexity Bounds for Entropic Best Policy Identification
作者
Authors
Amer Essakine | Claire Vernade
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
-
📝 摘要
Abstract
We study best-policy identification for finite-horizon risk-sensitive reinforcement learning under the entropic risk measure. Recent work established a constant gap in the exponential horizon dependence between lower and upper bounds on the number of samples required to identify an approximately optimal policy. Precisely, known lower bounds scale in $Ω(e^{|β| H})$ where $H$ is the horizon of the MDP, while the state-of-the-art upper bound achieves at best $O(e^{2|β| H})$ (arXiv:2506.00286v2) using a generative model. We show that this extra exponential factor can be traced to overly loose concentration control for exponential utilities. To close this open gap, we revisit the analysis of this problem through a forward-model based algorithm building on KL-based exploration bonuses that we adapt to the entropic criterion. The improvement we get is due to two main novel technical innovations. We leverage the smoothness properties of the exponential utility to derive sharper concentration bounds, and we propose a new stopping rule that exploits further this tightness to obtain a sample complexity that matches the lower bound.
📊 文章统计
Article Statistics
基础数据
Basic Stats
24
浏览
Views
0
下载
Downloads
16
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views