首页|期刊导航|计算机工程与应用|大模型辅助的强化学习奖励设计方法研究综述

大模型辅助的强化学习奖励设计方法研究综述

曹育箐陈希亮董浩洋周鑫孙鸣蔚

计算机工程与应用2026，Vol.62Issue(9)：83-107,25.

计算机工程与应用2026，Vol.62Issue(9)：83-107,25.DOI:10.3778/j.issn.1002-8331.2509-0180

大模型辅助的强化学习奖励设计方法研究综述

Survey on Large Model-Assisted Reward Design Methods for Reinforcement Learning

曹育箐 ¹陈希亮 ¹董浩洋 ¹周鑫 ¹孙鸣蔚¹

作者信息

1. 陆军工程大学指挥控制工程学院,南京 210007
折叠

摘要

Abstract

Reward design stands as a core challenge in reinforcement learning.Traditional methods,which rely on experts' manual design,are plagued by several limitations:strong subjectivity,difficulty in expressing complex objectives,high susceptibility to reward hacking,and the sparse rewards problem.Leveraging their robust capabilities in semantic under-standing,task decomposition,and code generation,large models have enabled end-to-end automated mapping from"human intentions"to"executable reward functions".This significantly lowers the barrier to reward design and overcomes the bot-tlenecks of traditional methods,which rely heavily on expert knowledge and suffer from ambiguities in goal expression.This paper systematically reviews the research progress of LLM-assisted reward design in reinforcement learning.It ana-lyzes the limitations of traditional reward design methods.It proposes a classification framework based on the roles of LLMs in reward design,namely reward function initializer,"human-in-the-loop"iterator,and"human-out-of-the-loop"iterator;it also elaborates on the representative works,underlying mechanisms,advantages,and disadvantages of each cat-egory of methods.It expounds on the fundamental framework of reinforcement learning and the challenges of reward design,analyzing the intrinsic mechanisms and value of large model assistance.It reviews the limitations of traditional reward design methods,constructs a"Reward Initializer-Reward Iterator"classification framework,and details the repre-sentative work,mechanisms,and applicable scenarios of various methods.Furthermore,it summarizes the key challenges faced by current research at the"Model-Methodology-System"level in terms of model reliability,interaction efficiency,and engineering safety,and proposes corresponding technical pathways.

关键词

强化学习/大模型/奖励设计

Key words

reinforcement learning/large models/reward design

分类

信息技术与安全科学

引用本文复制引用

曹育箐,陈希亮,董浩洋,周鑫,孙鸣蔚..大模型辅助的强化学习奖励设计方法研究综述[J].计算机工程与应用,2026,62(9):83-107,25.

基金项目

国家自然科学基金(62273356) （62273356）

国家部委基金. （）

计算机工程与应用

ISSN：1002-8331

访问量0

下载量0

段落导航