计算机科学与探索2024,Vol.18Issue(5):1223-1231,9.DOI:10.3778/j.issn.1673-9418.2303031
带有惩罚措施的自竞争事后经验重播算法
Self-competitive Hindsight Experience Replay with Penalty Measures
摘要
Abstract
Self-competitive hindsight experience replay(SCHER)is an improved strategy proposed based on the hindsight experience replay(HER)algorithm.The HER algorithm generates virtual labeled data by replaying experi-ences to optimize the model in the face of sparse environmental rewards.However,the HER algorithm has two prob-lems:firstly,it cannot handle the large amount of repetitive data generated due to sparse rewards,which contami-nates the experience pool;secondly,virtual goals may randomly select intermediate states that are not helpful in completing the task,leading to learning bias.To address these issues,the SCHER algorithm proposes two improve-ment strategies:firstly,increase the adaptive reward signal to penalize meaningless actions made by agents and quickly avoid such operations;secondly,use self-competition strategy to generate two sets of data for the same task,analyze and compare them,and find the key steps that enable the agent to succeed in different environments,thereby improving the accuracy of generated virtual goals.Experimental results show that the SCHER algorithm can better utilize the experience replay technique,increasing the average task success rate by 5.7 percentage points,and has higher accuracy and generalization ability.关键词
深度强化学习/稀疏奖励/经验回放/自适应奖励信号Key words
deep reinforcement learning/sparse reward/experience replay/adaptive reward signal分类
信息技术与安全科学引用本文复制引用
王子豪,钱雪忠,宋威..带有惩罚措施的自竞争事后经验重播算法[J].计算机科学与探索,2024,18(5):1223-1231,9.基金项目
国家自然科学基金(62076110) (62076110)
江苏省自然科学基金(BK20181341). This work was supported by the National Natural Science Foundation of China(62076110),and the Natural Science Foundation of Jiangsu Province(BK20181341). (BK20181341)