高师理科学刊2025,Vol.45Issue(8):18-24,7.DOI:10.3969/j.issn.1007-9831.2025.08.004
基于BERT模型的多标签文本分类
Multi-label text classification based on BERT model
摘要
Abstract
Addressing the issue of multi-label text classification accuracy being affected by imbalanced original data,a method based on cosine similarity tree structure,the BCL-tree model,is proposed,on the basis of maintaining the original data distribution,the inter-label correlation is introduceed to improve classification accuracy.Firstly,based on the sample size of the labels,the one with a larger sample size is selected as the root node for constructing the tree.Secondly,the cosine similarity is used to calculate the correlation between other small sample labels and the root node,and the labels are stored in a tree structure according to the degree of correlation.Finally,the label tree embedding is integrated into the BCL-tree model,thereby aiding the classifier in better understanding the relationships between labels.By combining the label tree embedding with the BCL output during the forward propagation process of the model,the classification performance of the model is improved.A comparative analysis is conducted between the BCL model and four baseline models on two distinct datasets.The experimental results indicated that the F1 scores of the BCL-tree model on two different datasets have improved by 4.5%and 8.8%respectively compared to the optimal CNN method in the baseline model,respectively,thus validating the superiority of this method in classification accuracy and label correlation modeling.关键词
多标签文本分类/样本分布不均/BERT模型/余弦相似度Key words
multi-label text classification/imbalanced sample distribution/BERT model/cosine similarity分类
信息技术与安全科学引用本文复制引用
崔海越,李延玲..基于BERT模型的多标签文本分类[J].高师理科学刊,2025,45(8):18-24,7.基金项目
青海民族大学研究生创新项目(07M2024013) (07M2024013)
青海省自然科学基金项目(2023-ZJ-949Q) (2023-ZJ-949Q)