计算机工程与科学2017,Vol.39Issue(5):841-848,8.DOI:10.3969/j.issn.1007-130X.2017.05.004
基于MapReduce的Bagging决策树优化算法
An optimized bagging decision tree algorithm based on MapReduce
摘要
Abstract
In order to address the shortcomings of overfitting and poor scalability of the C4.5 decision tree algorithm,we propose an optimized C4.5 algorithm with Bagging technique,and then parallelize it according to the MapReduce model.The optimized algorithm can obtain multiple new training sets that are equal to the initial training set by sampling with replacement.Multiple classifiers can be obtained by training the algorithm with these new training sets.A final classifier is generated according to a majority voting rule that integrates the training results.Then,the optimized algorithm is parallelized in three aspects,including parallel processing training sets,parallel selecting optimal decomposition attributes and optimal decomposition point,and parallel generating child nodes.A parallel algorithm based on job workflow is implemented to improve the ability of big data analysis.Experimental results show that the parallel and optimized decision tree algorithm has higher accuracy,higher sensitivity,better scalability and higher performance.关键词
决策树/Bagging/MapReduce模型/大数据分析/准确性Key words
decision tree/Bagging/MapReduce model/big data analysis/accuracy分类
信息技术与安全科学引用本文复制引用
张元鸣,陈苗,陆佳炜,徐俊,肖刚..基于MapReduce的Bagging决策树优化算法[J].计算机工程与科学,2017,39(5):841-848,8.基金项目
浙江省重大科技专项(2014C01408) (2014C01408)
浙江省公益性技术项目(2017C31014) (2017C31014)