A Diffusion Analysis of Policy Gradient for Stochastic Bandits
作者
Authors
Tor Lattimore
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
美国United States
📝 摘要
Abstract
We study a continuous-time diffusion approximation of policy gradient for $k$-armed stochastic bandits. We prove that with a learning rate $η= O(Δ^2/\log(n))$ the regret is $O(k \log(k) \log(n) / η)$ where $n$ is the horizon and $Δ$ the minimum gap. Moreover, we construct an instance with only logarithmically many arms for which the regret is linear unless $η= O(Δ^2)$.
📊 文章统计
Article Statistics
基础数据
Basic Stats
258
浏览
Views
0
下载
Downloads
9
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views