|国家科技期刊平台
首页|期刊导航|计算机科学与探索|带有惩罚措施的自竞争事后经验重播算法

带有惩罚措施的自竞争事后经验重播算法OA北大核心CSTPCD

Self-competitive Hindsight Experience Replay with Penalty Measures

中文摘要英文摘要

自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功的关键步骤,提高生成虚拟目标的准确程度.实验结果表明,SCHER算法可以更好地利用经验回放技术,将平均任务成功率提高5.7个百分点,拥有更高的准确率和泛化能力.

Self-competitive hindsight experience replay(SCHER)is an improved strategy proposed based on the hindsight experience replay(HER)algorithm.The HER algorithm generates virtual labeled data by replaying experi-ences to optimize the model in the face of sparse environmental rewards.However,the HER algorithm has two prob-lems:firstly,it cannot handle the large amount of repetitive data generated due to sparse rewards,which contami-nates the experience pool;secondly,virtual goals may randomly select intermediate states that are not helpful in completing the task,leading to learning bias.To address these issues,the SCHER algorithm proposes two improve-ment strategies:firstly,increase the adaptive reward signal to penalize meaningless actions made by agents and quickly avoid such operations;secondly,use self-competition strategy to generate two sets of data for the same task,analyze and compare them,and find the key steps that enable the agent to succeed in different environments,thereby improving the accuracy of generated virtual goals.Experimental results show that the SCHER algorithm can better utilize the experience replay technique,increasing the average task success rate by 5.7 percentage points,and has higher accuracy and generalization ability.

王子豪;钱雪忠;宋威

江南大学 人工智能与计算机学院,江苏 无锡 214122

计算机与自动化

深度强化学习稀疏奖励经验回放自适应奖励信号

deep reinforcement learningsparse rewardexperience replayadaptive reward signal

《计算机科学与探索》 2024 (005)

1223-1231 / 9

国家自然科学基金(62076110);江苏省自然科学基金(BK20181341). This work was supported by the National Natural Science Foundation of China(62076110),and the Natural Science Foundation of Jiangsu Province(BK20181341).

10.3778/j.issn.1673-9418.2303031

评论