郑州大学学报(理学版)2024,Vol.56Issue(2):43-50,8.DOI:10.13705/j.issn.1671-6841.2022168
基于RoBERTa-RCNN和注意力池化的新闻主题文本分类
News Topic Text Classification Based on RoBERTa-RCNN and Attention Pooling
摘要
Abstract
In order to solve the problem of semantic ambiguity and low standardization of words caused by the lack of context information of Chinese news topics,a news topic text classification method based on RoBERTa-RCNN and multi-head attention pooling mechanism was proposed.Data augmentation technique was used to back-translate part of the training data.The self-coding pre-training model and RCNN were used to extract preliminary and deep features of text,and the idea of multi-head attention was combined to improve the maximum pooling layer.This method adopted fusion mechanisms to improve the defects that the maximum pooling strategy in RCNN was single and couldn't be dynamically optimized.Experi-ments were conducted on three news topic data sets and used the Mish function which was more suitable for news topics classification instead of ReLU function.The label smoothing was used to solve the overfitting problem.The results showed that the proposed method was more effective than traditional classification methods,and the feasibility of the model in classification task was verified through ablation experiments.关键词
预训练语言模型/文本分类/循环卷积神经网络/注意力机制/标签平滑/数据增强Key words
pre-trained language model/text classification/recurrent convolutional neural network/at-tention mechanism/label smoothing/data augmentation分类
信息技术与安全科学引用本文复制引用
王乾,曾诚,何鹏,张海丰,余新言..基于RoBERTa-RCNN和注意力池化的新闻主题文本分类[J].郑州大学学报(理学版),2024,56(2):43-50,8.基金项目
国家自然科学基金项目(61977021,61902114) (61977021,61902114)
湖北省重点研发计划项目(2021BAA184). (2021BAA184)