经验是最佳教师:激励有效探索加强法学硕士课程的学习
Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

作者
Authors Wenjian Zhang|Kongcheng Zhang|Jiaxin Qi|Baisheng Lai|Jianqiang Huang

期刊
Journal arXiv

年份
Year 2026

分类
Category 人工智能
Artificial Intelligence

国家
Country 中国China

🔗 访问原文
🔗 Access Paper

📝 摘要
Abstract

Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Language Models (LLMs), yet still suffers from ineffective exploration confined to curent policy distribution. In fact, RL optimization can be viewed as steering the policy toward an ideal distribution that maximizes the rewards, while effective exploration should align efforts with desired target. Leveraging this insight, we propose HeRL, a Hindsight experience guided Reinforcement Learning framework to bootstrap effective exploration by explicitly telling LLMs the desired behaviors specified in rewards. Concretely, HeRL treats failed trajectories along with their unmet rubrics as hindsight experience, which serves as in-context guidance for the policy to explore desired responses beyond its current distribution. Additionally, we introduce a bonus reward to incentivize responses with greater potential for improvement under such guidance. HeRL facilitates effective learning from desired high quality samples without repeated trial-and-error from scratch, yielding a more accurate estimation of the expected gradient theoretically. Extensive experiments across various benchmarks demonstrate that HeRL achieves superior performance gains over baselines, and can further benefit from experience guided self-improvement at test time. Our code is available at https://github.com/sikelifei/HeRL.

📊 文章统计
Article Statistics

基础数据
Basic Stats

208 浏览
Views

0 下载
Downloads

1 引用
Citations

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

影响因子分析
Impact Analysis

7.50 综合评分
Overall Score

引用影响力
Citation Impact

浏览热度
View Popularity

下载频次
Download Frequency

经验是最佳教师:激励有效探索加强法学硕士课程的学习
Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

📝 摘要
Abstract

📊 文章统计
Article Statistics

基础数据
Basic Stats

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

📄 相关文章
Related Articles

经验是最佳教师:激励有效探索加强法学硕士课程的学习Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

📝 摘要Abstract

📊 文章统计Article Statistics

基础数据Basic Stats

引用趋势Citation Trend

阅读国家分布Country Distribution

阅读机构分布Institution Distribution

月度浏览趋势Monthly Views

相关关键词Related Keywords

影响因子分析Impact Analysis

📄 相关文章Related Articles

海洋智能分析Ocean AI Analysis

经验是最佳教师:激励有效探索加强法学硕士课程的学习
Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

📝 摘要
Abstract

📊 文章统计
Article Statistics

基础数据
Basic Stats

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

📄 相关文章
Related Articles