计算机与数字工程2024,Vol.52Issue(3):791-794,4.DOI:10.3969/j.issn.1672-9722.2024.03.027
基于在线翻译的中文文本数据增强技术
Chinese Text Data Augmentation Technology Based on Online Translation
王小天 1奚彩萍1
作者信息
- 1. 江苏科技大学电子信息学院 镇江 212000
- 折叠
摘要
Abstract
Data augmentation is a common method in the field of few shot learning.For text data,a common way of augmenta-tion is back translation.Through the neural translator,the data is translated into an intermediate language,and then is translated in-to the original language.However,limited by the quantity and quality of open parallel corpora,it is difficult for individual research-ers to train qualified neural translators.In order to solve the dependence of back translation on parallel corpus,this paper proposes a text data augmentation technology based on online translation.Taking Baidu translation as an example,this paper studies the bene-fits brought by different intermediate languages and the most suitable augmentation multiple under different data scenario,and stud-ies the label effectiveness of augmentation data through visualization.Experiments show that the Chinese text data augmentation tech-nology based on online translation achieves consistent improvement across four Chinese classification tasks,and the improvement is more obvious in small data sets.On average,the use of augmentation techniques increases F1 by more than 5%.At the same time,this paper points out the irrationality of the previous evaluation of data augmentation benefits.And the improved evaluation setup is put forward.关键词
数据增强/自然语言处理/反译/文本分类Key words
data augmentation/natural language processing/back translation/text classification分类
计算机与自动化引用本文复制引用
王小天,奚彩萍..基于在线翻译的中文文本数据增强技术[J].计算机与数字工程,2024,52(3):791-794,4.