| 注册
首页|期刊导航|计算机科学与探索|基于大语言模型的NLP数据增强方法综述

基于大语言模型的NLP数据增强方法综述

许德龙 林民 王玉荣 张树钧

计算机科学与探索2025,Vol.19Issue(6):1395-1413,19.
计算机科学与探索2025,Vol.19Issue(6):1395-1413,19.DOI:10.3778/j.issn.1673-9418.2410054

基于大语言模型的NLP数据增强方法综述

Survey of NLP Data Augmentation Methods Based on Large Language Models

许德龙 1林民 1王玉荣 2张树钧3

作者信息

  • 1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
  • 2. 内蒙古师范大学 数学科学学院,呼和浩特 010022
  • 3. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022||内蒙古师范大学 文学院,呼和浩特 010022
  • 折叠

摘要

Abstract

Currently,large language models show great potential in the field of natural language processing(NLP),but their training process relies on a large number of high-quality samples.In low-resource scenarios,the number of existing data samples can hardly support the convergence of model training as the model size keeps increasing,and this problem has inspired researchers in related fields to investigate data augmentation methods.However,traditional data enhancement methods have limited application scope and data distortion problems in the context of large models in NLP.In contrast,data enhancement methods based on large language models can address this challenge more effectively.This paper offers a comprehensive exploration of data augmentation methods for large language models in the current NLP field and adopts a comprehensive perspective to study data enhancement in the NLP domain.Firstly,the development history of traditional data enhancement methods and big language models in the NLP domain is reviewed.Then,a variety of large language model data enhancement methods in the NLP domain at this stage are summarized,and the scope of application,advantages and limitations of each method are discussed in depth.Subsequently,data enhancement evaluation methods in the field of NLP are introduced.Finally,future research directions of data enhancement methods for large language models in the NLP domain are discussed through comparative experiments and result analyses of current methods,and prospective sug-gestions are made.

关键词

数据增强方法/大语言模型/自然语言处理/深度学习/人工智能

Key words

data augmentation/large language models/natural language processing/deep learning/artificial intelligence

分类

计算机与自动化

引用本文复制引用

许德龙,林民,王玉荣,张树钧..基于大语言模型的NLP数据增强方法综述[J].计算机科学与探索,2025,19(6):1395-1413,19.

基金项目

国家自然科学基金(62266033) (62266033)

内蒙古自然科学基金(2021LHMS06010) (2021LHMS06010)

无穷维哈密顿系统及其算法应用教育部重点实验室(内蒙古师范大学)开放课题(2023KFZD03) (内蒙古师范大学)

内蒙古师范大学研究生科研创新基金(CXJJB23011). This work was supported by the National Natural Science Foundation of China(62266033),the Natural Science Foundation of Inner Mongolia(2021LHMS06010),the Project of Key Laboratory of Infinite-Dimensional Hamiltonian System and Its Algorithm Applica-tion(Inner Mongolia Normal University),Ministry of Education(2023KFZD03),and the Innovation Fund for Postgraduates of Inner Mongolia Normal University(CXJJB23011). (CXJJB23011)

计算机科学与探索

OA北大核心

1673-9418

访问量0
|
下载量0
段落导航相关论文