首页|期刊导航|计算机工程与应用|基于反馈的大语言模型内容与行为对齐方法综述

基于反馈的大语言模型内容与行为对齐方法综述

张钰莹云静刘雪颖史晓国

计算机工程与应用2025，Vol.61Issue(20)：75-104,30.

计算机工程与应用2025，Vol.61Issue(20)：75-104,30.DOI:10.3778/j.issn.1002-8331.2410-0452

基于反馈的大语言模型内容与行为对齐方法综述

Survey of Feedback-Based Content and Behavior Alignment Methods for Large Language Model

张钰莹 ¹云静 ¹刘雪颖 ²史晓国²

作者信息

1. 内蒙古工业大学数据科学与应用学院,呼和浩特 010080||内蒙古自治区大数据软件服务工程技术研究中心,呼和浩特 010080||内蒙古北疆网络空间安全重点实验室,呼和浩特 010080
2. 内蒙古工业大学数据科学与应用学院,呼和浩特 010080||内蒙古自治区大数据软件服务工程技术研究中心,呼和浩特 010080
折叠

摘要

Abstract

In recent years,large language models have demonstrated exceptional capabilities in natural language under-standing,generation,and reasoning across a range of tasks.However,ensuring that their outputs align with human-defined standards has become a critical solution.This paper presents a systematic review of feedback-based alignment methods,focusing on the dual objectives of"content alignment"and"behavior alignment".The review spans conceptual frameworks,technical implementations,and evaluation methodologies.Firstly,it clarifies the sources,formats,and intended purposes of feedback,establishing a conceptual framework for feedback-based alignment.Secondly,it summarizes existing feedback alignment methods in the order of model training,inference,and generation.Following this,it reviews the funda-mental technical metrics for evaluating large models,along with relevant datasets and benchmarks.Finally,this paper highlights the potential of feedback-based alignment methods to improve the performance of large language models,as well as the significant challenges and key issues currently faced.

关键词

大语言模型(LLMs)/AI对齐/内容安全/评估基准

Key words

large language models(LLMs)/AI alignment/content security/evaluate benchmarks

分类

信息技术与安全科学

引用本文复制引用

张钰莹,云静,刘雪颖,史晓国..基于反馈的大语言模型内容与行为对齐方法综述[J].计算机工程与应用,2025,61(20):75-104,30.

基金项目

国家自然科学基金(62062055) （62062055）

内蒙古高校青年科技英才项目(NJYT24061) （NJYT24061）

内蒙古自治区直属高校基本科研业务费项目(JY20230092). （JY20230092）

计算机工程与应用

OA北大核心

ISSN：1002-8331

访问量0

下载量0

段落导航