计算机工程与应用2025,Vol.61Issue(20):75-104,30.DOI:10.3778/j.issn.1002-8331.2410-0452
基于反馈的大语言模型内容与行为对齐方法综述
Survey of Feedback-Based Content and Behavior Alignment Methods for Large Language Model
摘要
Abstract
In recent years,large language models have demonstrated exceptional capabilities in natural language under-standing,generation,and reasoning across a range of tasks.However,ensuring that their outputs align with human-defined standards has become a critical solution.This paper presents a systematic review of feedback-based alignment methods,focusing on the dual objectives of"content alignment"and"behavior alignment".The review spans conceptual frameworks,technical implementations,and evaluation methodologies.Firstly,it clarifies the sources,formats,and intended purposes of feedback,establishing a conceptual framework for feedback-based alignment.Secondly,it summarizes existing feedback alignment methods in the order of model training,inference,and generation.Following this,it reviews the funda-mental technical metrics for evaluating large models,along with relevant datasets and benchmarks.Finally,this paper highlights the potential of feedback-based alignment methods to improve the performance of large language models,as well as the significant challenges and key issues currently faced.关键词
大语言模型(LLMs)/AI对齐/内容安全/评估基准Key words
large language models(LLMs)/AI alignment/content security/evaluate benchmarks分类
信息技术与安全科学引用本文复制引用
张钰莹,云静,刘雪颖,史晓国..基于反馈的大语言模型内容与行为对齐方法综述[J].计算机工程与应用,2025,61(20):75-104,30.基金项目
国家自然科学基金(62062055) (62062055)
内蒙古高校青年科技英才项目(NJYT24061) (NJYT24061)
内蒙古自治区直属高校基本科研业务费项目(JY20230092). (JY20230092)