通信学报2018,Vol.39Issue(12):151-159,9.DOI:10.11959/j.issn.1000-436x.2018281
基于贝叶斯算法的中文垃圾邮件过滤系统研究
Study on Chinese spam filtering system based on Bayes algorithm
摘要
Abstract
In view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Further, a three-layer structure model based on GWO_GA structure learning algorithm was proposed to expand the limit of text features and improve the diversity of text features. The new structure learning algorithm relaxes the conditional independence assumption of feature properties. A fine classification layer was added between class layer and feature layer to increase feature coverage. The experiment demonstrates that the three-layer Bayesian network algorithm with TF-IDF feature extraction based on the central word extension and GWO_GA structure learning improves the effect of Chinese spam filtering.关键词
贝叶斯网络/TF-IDF/遗传算法/短文本分类/中文垃圾邮件过滤Key words
Bayesian network/ TF-IDF/ Genetic Algorithm/ short text classification/ Chinese spam filtering分类
信息技术与安全科学引用本文复制引用
刘浩然,丁攀,郭长江,常金凤,崔静闯..基于贝叶斯算法的中文垃圾邮件过滤系统研究[J].通信学报,2018,39(12):151-159,9.基金项目
国家自然科学基金资助项目(No.51641609) (No.51641609)
河北省自然科学基金资助项目(No.F2016203354) (No.F2016203354)