软件导刊2025,Vol.24Issue(5):79-86,8.DOI:10.11907/rjdk.241141
基于改进BERT集成结构的文本分类模型研究
Research on Text Classification Model Based on Improved BERT Ensemble Structure
刘振宗 1王超群 2陈乐 2陶永辉 3王丹3
作者信息
- 1. 上海电力大学 计算机科学与技术学院,上海 201300
- 2. 航天智慧能源研究院
- 3. 上海航天能源股份有限公司,上海 201100
- 折叠
摘要
Abstract
In order to improve the extraction of contextual text features,capture the semantic relationships between texts,and effectively fuse global and local information,the BERT-Transformer-TextCNN parallel text classification model is proposed.This model preprocesses the in-put text through the BERT model to obtain text feature vectors.The Transformer coding layer is used to extract the global information of text fea-ture vectors,and L2 regularization,residual connection and cosine similarity are introduced in the coding layer to overcome the effects of over-fitting,gradient disappearance and vector length.TextCNN is used to extract local information of text feature vectors,and in the process,re-sidual connections,He initialization and average pooling layers are introduced to cope with the disappearance of gradients and insufficient in-formation utilization.Finally,the global and local information are combined,and the text is classified through the Softmax classifier to obtain the final classification result.experiment result shows,Compared with the traditional model in the THUCNews data set,the improved model's accuracy increased by 12%,and its F1 value also increased by 8%.On the IMDB data set,the accuracy and F1 value increased by 13%and 8%respectively,proving the effectiveness of the model in extracting global and local information and integrating semantic relationships.关键词
文本分类/特征提取/Transformer/TextCNNKey words
text classification/feature extraction/Transformer/TextCNN分类
信息技术与安全科学引用本文复制引用
刘振宗,王超群,陈乐,陶永辉,王丹..基于改进BERT集成结构的文本分类模型研究[J].软件导刊,2025,24(5):79-86,8.