现代电子技术2024,Vol.47Issue(23):105-112,8.DOI:10.16652/j.issn.1004-373x.2024.23.016
基于BERT模型的网站敏感信息识别及其变体还原技术研究
Research on website sensitive information identification and variant restoration technology based on BERT model
符泽凡 1姚竟发 2滕桂法3
作者信息
- 1. 河北农业大学 信息科学与技术学院,河北 保定 071001
- 2. 河北软件职业技术学院 软件工程系,河北 保定 071000||河北省高校智能互联装备与多模态大数据应用技术研发中心,河北 保定 071000
- 3. 河北农业大学 信息科学与技术学院,河北 保定 071001||河北省数字农业产业技术研究院,河北 石家庄 050021||河北省农业大数据重点实验室,河北 保定 071001
- 折叠
摘要
Abstract
In view of the rapid development of the network and the decreasing cost of website establishment,to avoid detection of sensitive information,variant words are frequently utilized within texts of various types of websites,so that the sensitive word databases can be evaded.Therefore,this study proposes a method for identifying website sensitive information based on a BERT(bidirectional encoder representation from transformers)model combined with a variant word restoration algorithm.In this method,the variant words within the texts are restored,the text content are vectorized by the BERT model and then inputted into a model composed of BiLSTM(bi-directional long short-term memory)layer and CNN(convolutional neural network)layer for training,so as to achieve the identification of sensitive information and its variant words within websites.Experimental results demonstrate a high accuracy in variant word restoration,and the text vectors obtained by the BERT model exhibit excellent performance in the tasks of text classification.In comparison with the other models,the BERT-BiLSTM-CNN model demonstrates higher accuracy rate,recall rate,and F1 score in the task of identifying sensitive information on websites,which indicates a significant improvement.The proposed model provides reference and support for variant word restoration and the field of sensitive information identification,possessing a certain practical application value.关键词
网站/敏感信息/变体词/BERT/双向长短期记忆网络/卷积神经网络Key words
website/sensitive information/variant word/BERT/BiLSTM/CNN分类
信息技术与安全科学引用本文复制引用
符泽凡,姚竟发,滕桂法..基于BERT模型的网站敏感信息识别及其变体还原技术研究[J].现代电子技术,2024,47(23):105-112,8.