施托克特强盗政策的传播分析
A Diffusion Analysis of Policy Gradient for Stochastic Bandits

作者
Authors Tor Lattimore

期刊
Journal arXiv

年份
Year 2026

分类
Category 统计学
Statistics

国家
Country 美国United States

🔗 访问原文
🔗 Access Paper

📝 摘要
Abstract

We study a continuous-time diffusion approximation of policy gradient for $k$-armed stochastic bandits. We prove that with a learning rate $η= O(Δ^2/\log(n))$ the regret is $O(k \log(k) \log(n) / η)$ where $n$ is the horizon and $Δ$ the minimum gap. Moreover, we construct an instance with only logarithmically many arms for which the regret is linear unless $η= O(Δ^2)$.

📊 文章统计
Article Statistics

基础数据
Basic Stats

271 浏览
Views

0 下载
Downloads

9 引用
Citations

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

影响因子分析
Impact Analysis

8.40 综合评分
Overall Score

引用影响力
Citation Impact

浏览热度
View Popularity

下载频次
Download Frequency

施托克特强盗政策的传播分析
A Diffusion Analysis of Policy Gradient for Stochastic Bandits

📝 摘要
Abstract

📊 文章统计
Article Statistics

基础数据
Basic Stats

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

📄 相关文章
Related Articles

施托克特强盗政策的传播分析A Diffusion Analysis of Policy Gradient for Stochastic Bandits

📝 摘要Abstract

📊 文章统计Article Statistics

基础数据Basic Stats

引用趋势Citation Trend

阅读国家分布Country Distribution

阅读机构分布Institution Distribution

月度浏览趋势Monthly Views

相关关键词Related Keywords

影响因子分析Impact Analysis

📄 相关文章Related Articles

海洋智能分析Ocean AI Analysis

施托克特强盗政策的传播分析
A Diffusion Analysis of Policy Gradient for Stochastic Bandits

📝 摘要
Abstract

📊 文章统计
Article Statistics

基础数据
Basic Stats

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

📄 相关文章
Related Articles