|国家科技期刊平台
首页|期刊导航|计算机与数字工程|基于关系挖掘和对抗训练的多标签文本分类

基于关系挖掘和对抗训练的多标签文本分类OACSTPCD

Multi-label Text Classification Based on Relationship Mining and Adversarial Training

中文摘要英文摘要

传统的多标签文本分类方法存在忽略标签语义、没有充分利用文本与标签以及标签与标签之间的关系等问题.为了解决以上问题,论文提出了一种基于关系挖掘和对抗训练的多标签文本分类模型.该模型利用了BERT模型和图注意力网络(GAT)分别提取文本的语义信息和挖掘标签之间的关系.首先,通过BERT模型对文本进行编码,以获取文本的语义信息.然后,使用图注意力网络(GAT)来挖掘标签之间的关系,以更好地理解标签之间的依赖关系.为了进一步挖掘文本与可学习的标签嵌入之间的关系,该模型采用了多头自注意力机制.此外,为了提高模型的鲁棒性,论文采用了R-drop策略进行模型训练.实验结果表明,在AAPD和RCV1数据集上,所提出的模型相比当前一些主流的多标签文本分类模型,不仅能够关注文本信息,还能够有效捕捉文本与标签之间的依赖关系以及标签与标签之间的关系,从而取得更好的性能.

Traditional multi-label text classification methods ignore the label semantics and do not fully exploit the relation-ship between text and label as well as between label and label.In this paper,a multi-label text classification model is proposed based on relationship mining and adversarial training to solve the above problems.The BERT model and Graph Attention Network(GAT)are used to extract the semantic information of the text and mine the relationship between labels,respectively.First,the text is encoded using the BERT model to obtain semantic information of the text.Then,GAT is used to mine the relationships between la-bels to better understand the dependencies between labels.To further mine the relationship between text and learnable label embed-dings,the model employs a multi-head self-attention mechanism.Moreover,to improve the robustness of the model,the R-drop strategy is used for model training in this paper.Experimental results on AAPD and RCV1 datasets show that the proposed model not only focuses on textual information,but also effectively captures the dependencies between text and labels and the relationships be-tween labels to achieve better performance compared to some of the current mainstream multi-label text classification models.

杨冬菊;程伟飞

北方工业大学信息学院 北京 100144||大规模流数据集成与分析技术北京市重点实验室(北方工业大学) 北京 100144

计算机与自动化

BERT注意力机制R-drop图注意网络归一化

BERTattention mechanismR-dropgraph attention networknormalization

《计算机与数字工程》 2024 (001)

18-22,42 / 6

国家自然科学基金重点项目(编号:61832004);广州市科技计划项目-重点研发计划(编号:202206030009)资助.

10.3969/j.issn.1672-9722.2024.01.003

评论