| 注册
首页|期刊导航|计算机工程与应用|基于反馈的大语言模型内容与行为对齐方法综述

基于反馈的大语言模型内容与行为对齐方法综述

张钰莹 云静 刘雪颖 史晓国

计算机工程与应用2025,Vol.61Issue(20):75-104,30.
计算机工程与应用2025,Vol.61Issue(20):75-104,30.DOI:10.3778/j.issn.1002-8331.2410-0452

基于反馈的大语言模型内容与行为对齐方法综述

Survey of Feedback-Based Content and Behavior Alignment Methods for Large Language Model

张钰莹 1云静 1刘雪颖 2史晓国2

作者信息

  • 1. 内蒙古工业大学 数据科学与应用学院,呼和浩特 010080||内蒙古自治区大数据软件服务工程技术研究中心,呼和浩特 010080||内蒙古北疆网络空间安全重点实验室,呼和浩特 010080
  • 2. 内蒙古工业大学 数据科学与应用学院,呼和浩特 010080||内蒙古自治区大数据软件服务工程技术研究中心,呼和浩特 010080
  • 折叠

摘要

Abstract

In recent years,large language models have demonstrated exceptional capabilities in natural language under-standing,generation,and reasoning across a range of tasks.However,ensuring that their outputs align with human-defined standards has become a critical solution.This paper presents a systematic review of feedback-based alignment methods,focusing on the dual objectives of"content alignment"and"behavior alignment".The review spans conceptual frameworks,technical implementations,and evaluation methodologies.Firstly,it clarifies the sources,formats,and intended purposes of feedback,establishing a conceptual framework for feedback-based alignment.Secondly,it summarizes existing feedback alignment methods in the order of model training,inference,and generation.Following this,it reviews the funda-mental technical metrics for evaluating large models,along with relevant datasets and benchmarks.Finally,this paper highlights the potential of feedback-based alignment methods to improve the performance of large language models,as well as the significant challenges and key issues currently faced.

关键词

大语言模型(LLMs)/AI对齐/内容安全/评估基准

Key words

large language models(LLMs)/AI alignment/content security/evaluate benchmarks

分类

信息技术与安全科学

引用本文复制引用

张钰莹,云静,刘雪颖,史晓国..基于反馈的大语言模型内容与行为对齐方法综述[J].计算机工程与应用,2025,61(20):75-104,30.

基金项目

国家自然科学基金(62062055) (62062055)

内蒙古高校青年科技英才项目(NJYT24061) (NJYT24061)

内蒙古自治区直属高校基本科研业务费项目(JY20230092). (JY20230092)

计算机工程与应用

OA北大核心

1002-8331

访问量0
|
下载量0
段落导航相关论文