计算机工程与应用2026,Vol.62Issue(9):83-107,25.DOI:10.3778/j.issn.1002-8331.2509-0180
大模型辅助的强化学习奖励设计方法研究综述
Survey on Large Model-Assisted Reward Design Methods for Reinforcement Learning
摘要
Abstract
Reward design stands as a core challenge in reinforcement learning.Traditional methods,which rely on experts' manual design,are plagued by several limitations:strong subjectivity,difficulty in expressing complex objectives,high susceptibility to reward hacking,and the sparse rewards problem.Leveraging their robust capabilities in semantic under-standing,task decomposition,and code generation,large models have enabled end-to-end automated mapping from"human intentions"to"executable reward functions".This significantly lowers the barrier to reward design and overcomes the bot-tlenecks of traditional methods,which rely heavily on expert knowledge and suffer from ambiguities in goal expression.This paper systematically reviews the research progress of LLM-assisted reward design in reinforcement learning.It ana-lyzes the limitations of traditional reward design methods.It proposes a classification framework based on the roles of LLMs in reward design,namely reward function initializer,"human-in-the-loop"iterator,and"human-out-of-the-loop"iterator;it also elaborates on the representative works,underlying mechanisms,advantages,and disadvantages of each cat-egory of methods.It expounds on the fundamental framework of reinforcement learning and the challenges of reward design,analyzing the intrinsic mechanisms and value of large model assistance.It reviews the limitations of traditional reward design methods,constructs a"Reward Initializer-Reward Iterator"classification framework,and details the repre-sentative work,mechanisms,and applicable scenarios of various methods.Furthermore,it summarizes the key challenges faced by current research at the"Model-Methodology-System"level in terms of model reliability,interaction efficiency,and engineering safety,and proposes corresponding technical pathways.关键词
强化学习/大模型/奖励设计Key words
reinforcement learning/large models/reward design分类
信息技术与安全科学引用本文复制引用
曹育箐,陈希亮,董浩洋,周鑫,孙鸣蔚..大模型辅助的强化学习奖励设计方法研究综述[J].计算机工程与应用,2026,62(9):83-107,25.基金项目
国家自然科学基金(62273356) (62273356)
国家部委基金. ()