| 注册
首页|期刊导航|现代电子技术|基于BERT模型的网站敏感信息识别及其变体还原技术研究

基于BERT模型的网站敏感信息识别及其变体还原技术研究

符泽凡 姚竟发 滕桂法

现代电子技术2024,Vol.47Issue(23):105-112,8.
现代电子技术2024,Vol.47Issue(23):105-112,8.DOI:10.16652/j.issn.1004-373x.2024.23.016

基于BERT模型的网站敏感信息识别及其变体还原技术研究

Research on website sensitive information identification and variant restoration technology based on BERT model

符泽凡 1姚竟发 2滕桂法3

作者信息

  • 1. 河北农业大学 信息科学与技术学院,河北 保定 071001
  • 2. 河北软件职业技术学院 软件工程系,河北 保定 071000||河北省高校智能互联装备与多模态大数据应用技术研发中心,河北 保定 071000
  • 3. 河北农业大学 信息科学与技术学院,河北 保定 071001||河北省数字农业产业技术研究院,河北 石家庄 050021||河北省农业大数据重点实验室,河北 保定 071001
  • 折叠

摘要

Abstract

In view of the rapid development of the network and the decreasing cost of website establishment,to avoid detection of sensitive information,variant words are frequently utilized within texts of various types of websites,so that the sensitive word databases can be evaded.Therefore,this study proposes a method for identifying website sensitive information based on a BERT(bidirectional encoder representation from transformers)model combined with a variant word restoration algorithm.In this method,the variant words within the texts are restored,the text content are vectorized by the BERT model and then inputted into a model composed of BiLSTM(bi-directional long short-term memory)layer and CNN(convolutional neural network)layer for training,so as to achieve the identification of sensitive information and its variant words within websites.Experimental results demonstrate a high accuracy in variant word restoration,and the text vectors obtained by the BERT model exhibit excellent performance in the tasks of text classification.In comparison with the other models,the BERT-BiLSTM-CNN model demonstrates higher accuracy rate,recall rate,and F1 score in the task of identifying sensitive information on websites,which indicates a significant improvement.The proposed model provides reference and support for variant word restoration and the field of sensitive information identification,possessing a certain practical application value.

关键词

网站/敏感信息/变体词/BERT/双向长短期记忆网络/卷积神经网络

Key words

website/sensitive information/variant word/BERT/BiLSTM/CNN

分类

信息技术与安全科学

引用本文复制引用

符泽凡,姚竟发,滕桂法..基于BERT模型的网站敏感信息识别及其变体还原技术研究[J].现代电子技术,2024,47(23):105-112,8.

现代电子技术

OA北大核心CSTPCD

1004-373X

访问量0
|
下载量0
段落导航相关论文