| 注册
首页|期刊导航|通信学报|基于贝叶斯算法的中文垃圾邮件过滤系统研究

基于贝叶斯算法的中文垃圾邮件过滤系统研究

刘浩然 丁攀 郭长江 常金凤 崔静闯

通信学报2018,Vol.39Issue(12):151-159,9.
通信学报2018,Vol.39Issue(12):151-159,9.DOI:10.11959/j.issn.1000-436x.2018281

基于贝叶斯算法的中文垃圾邮件过滤系统研究

Study on Chinese spam filtering system based on Bayes algorithm

刘浩然 1丁攀 2郭长江 1常金凤 2崔静闯3

作者信息

  • 1. 燕山大学信息科学与工程学院,河北 秦皇岛 066004
  • 2. 河北省特种光纤与光纤传感重点实验室,河北 秦皇岛 066004
  • 3. 燕山大学里仁学院,河北 秦皇岛 066004
  • 折叠

摘要

Abstract

In view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Further, a three-layer structure model based on GWO_GA structure learning algorithm was proposed to expand the limit of text features and improve the diversity of text features. The new structure learning algorithm relaxes the conditional independence assumption of feature properties. A fine classification layer was added between class layer and feature layer to increase feature coverage. The experiment demonstrates that the three-layer Bayesian network algorithm with TF-IDF feature extraction based on the central word extension and GWO_GA structure learning improves the effect of Chinese spam filtering.

关键词

贝叶斯网络/TF-IDF/遗传算法/短文本分类/中文垃圾邮件过滤

Key words

Bayesian network/ TF-IDF/ Genetic Algorithm/ short text classification/ Chinese spam filtering

分类

信息技术与安全科学

引用本文复制引用

刘浩然,丁攀,郭长江,常金凤,崔静闯..基于贝叶斯算法的中文垃圾邮件过滤系统研究[J].通信学报,2018,39(12):151-159,9.

基金项目

国家自然科学基金资助项目(No.51641609) (No.51641609)

河北省自然科学基金资助项目(No.F2016203354) (No.F2016203354)

通信学报

OA北大核心CSCDCSTPCD

1000-436X

访问量4
|
下载量0
段落导航相关论文