计算机技术与发展Issue(3):23-26,4.DOI:10.3969/j.issn.1673-629X.2013.03.006
朴素贝叶斯算法的MapReduce并行化分析与实现
Analysis and Implementation of MapReduce Parallelization of Naïve Bayes Algorithm
摘要
Abstract
Naïve Bayes is an efficient algorithm. Due to the limitation of memory and I/O resources,the efficiency of the algorithm has been greatly affected in mass data processing. In this paper,proposed a novel Naïve Bayes algorithm based on MapReduce programming model. Training set is cut apart before being processed. The core processing procedure is accomplished by MapReduce model. Extraction and parsing of the training set are processed in the Map function. Knowledge base of class and feature attributes are built in the Reduce function. In the experiments,mainly compare the performance of both the traditional algorithm and the improved parallel algorithm. The result of experiments shows that the parallel Naïve Bayes algorithm has good efficiency and high scalability in mass data processing.关键词
朴素贝叶斯分类算法/并行计算/MapReduceKey words
Naïve Bayes algorithm/parallel computing/MapReduce分类
信息技术与安全科学引用本文复制引用
张依杨,向阳,蒋锐权,张波,张君瑛..朴素贝叶斯算法的MapReduce并行化分析与实现[J].计算机技术与发展,2013,(3):23-26,4.基金项目
国家自然科学基金资助项目(61103069,71170148) (61103069,71170148)
国家科技计划课题(2012BAD35B01) (2012BAD35B01)
上海市科技创新计划(11DZ 1501703) (11DZ 1501703)
上海信息化发展专项基金(20091015) (20091015)
上海市科技创新计划(陈家镇)(11DZ1210600) (陈家镇)