计算机工程与应用2016,Vol.52Issue(1):133-140,8.DOI:10.3778/j.issn.1002-8331.1401-0232
基于数据填补和连续属性的朴素贝叶斯算法
Naive Bayes based on data filling and continuous attribute
摘要
Abstract
When dealing with classification problem, Naive Bayes(NB)usually assumes that the numerical continuous attributes follow normal distribution, the classification accuracy is also affected by the integrity of training data. But the actual sampled data are difficult to meet the above requirements. For missing data, the Naive Bayesian classifier uses existing incomplete data to implement parameter learning based on the Expectation-Maximum(EM)algorithm;for non-normal numerical continuous attributes, distribution density based on kernel density estimation and a new method are used to calculate the maximum posterior probability, meanwhile, the classification experiment using standard data sets verifies the effectiveness of the improvement. Finally, the improved algorithm(EM-DNB)is applied to the prediction of the protein purification technologies in biological engineering. The experimental results show that the accuracy is improved.关键词
朴素贝叶斯(NB)/期望最大值(EM)算法/连续属性/核密度估计/蛋白质纯化Key words
Naive Bayes(NB)/Expectation-Maximum(EM)algorithm/continuous attributes/kernel density estimation/protein purification分类
信息技术与安全科学引用本文复制引用
李忠波,杨建华,刘文琦..基于数据填补和连续属性的朴素贝叶斯算法[J].计算机工程与应用,2016,52(1):133-140,8.基金项目
国家科技重大专项(No.2009ZX09306-004,No.2011ZX09101-008-09). (No.2009ZX09306-004,No.2011ZX09101-008-09)