一种基于内容和ERNIE3.0-CapsNet的中文垃圾邮件识别方法OACSTPCD
A Chinese Spam Detection Method Based on Content and ERNIE3.0-CapsNet
针对目前中文垃圾邮件识别方法中的深度学习检测方法词向量表示不足和特征提取丰富度欠缺的问题,提出融合ERNIE3.0预训练模型的胶囊神经网络改进识别模型——ERNIE3.0-CapsNet.对于中文垃圾邮件内容文本,利用ERNIE3.0生成对于知识具备优异记忆和推理能力且语义丰富的词向量矩阵,再使用胶囊神经网络进行特征提取及分类,对于胶囊神经网络,改进了结构并使用GELU作为其动态路由的激活函数,设计了5组同类模型和4组激活函数的对比实验.在开源的TREC06C中文邮件数据集上,提出的ERNIE3.0-CapsNet模型效果在总体上表现突出,其准确率达到 99.45%.实验结果表明,ERNIE3.0-CapsNet 优于 ERNIE3.0-TextCNN,ERNIE3.0-RNN 等方法,证明了该模型在中文垃圾邮件识别效果的有效性和优异性.
In order to solve the problems of inadequate word vector representation and limited feature extraction richness in the current Chinese spam recognition methods based on deep learning,this paper proposes an improved recognition model by integrating the ERNIE3.0 pre-training model with the capsule neural network,referred to as ERNIE3.0-CapsNet.For the Chinese spam content text,we leverage ERNIE3.0 to generate a word vector matrix with outstanding memory and reasoning capabilities,along with rich semantics.Subsequently,we employ the capsule neural network for feature extraction and classification.For the capsule neural network,we enhance its structure,adopting GELU as the activation function of its dynamic routing,and conduct a comparative experiment between five groups of similar models and four groups of activation functions.On the open source TREC06C Chinese email dataset,the proposed ERNIE3.0-CapsNet model exhibits remarkable overall performance,achieving an accuracy rate of 99.45%.The experimental results demonstrate the superiority of ERNIE3.0-CapsNet over methods such as ERNIE3.0-TextCNN,ERNIE3.0-RNN confirming the model's effectiveness and superiority in Chinese spam recognition.
单晨棱;张新有;邢焕来;冯力
西南交通大学唐山研究院 河北唐山 063000西南交通大学唐山研究院 河北唐山 063000||西南交通大学计算机与人工智能学院 成都 611756西南交通大学计算机与人工智能学院 成都 611756
中文垃圾邮件ERNIE3.0胶囊神经网络激活函数文本分类
Chinese spamERNIE3.0capsule neural networkactivation functiontext classification
《信息安全研究》 2024 (003)
233-240 / 8
国家自然科学基金项目(62172342)
评论