| 注册
首页|期刊导航|计算机工程与科学|基于MapReduce的Bagging决策树优化算法

基于MapReduce的Bagging决策树优化算法

张元鸣 陈苗 陆佳炜 徐俊 肖刚

计算机工程与科学2017,Vol.39Issue(5):841-848,8.
计算机工程与科学2017,Vol.39Issue(5):841-848,8.DOI:10.3969/j.issn.1007-130X.2017.05.004

基于MapReduce的Bagging决策树优化算法

An optimized bagging decision tree algorithm based on MapReduce

张元鸣 1陈苗 1陆佳炜 1徐俊 1肖刚1

作者信息

  • 1. 浙江工业大学计算机科学与技术学院,浙江杭州310023
  • 折叠

摘要

Abstract

In order to address the shortcomings of overfitting and poor scalability of the C4.5 decision tree algorithm,we propose an optimized C4.5 algorithm with Bagging technique,and then parallelize it according to the MapReduce model.The optimized algorithm can obtain multiple new training sets that are equal to the initial training set by sampling with replacement.Multiple classifiers can be obtained by training the algorithm with these new training sets.A final classifier is generated according to a majority voting rule that integrates the training results.Then,the optimized algorithm is parallelized in three aspects,including parallel processing training sets,parallel selecting optimal decomposition attributes and optimal decomposition point,and parallel generating child nodes.A parallel algorithm based on job workflow is implemented to improve the ability of big data analysis.Experimental results show that the parallel and optimized decision tree algorithm has higher accuracy,higher sensitivity,better scalability and higher performance.

关键词

决策树/Bagging/MapReduce模型/大数据分析/准确性

Key words

decision tree/Bagging/MapReduce model/big data analysis/accuracy

分类

信息技术与安全科学

引用本文复制引用

张元鸣,陈苗,陆佳炜,徐俊,肖刚..基于MapReduce的Bagging决策树优化算法[J].计算机工程与科学,2017,39(5):841-848,8.

基金项目

浙江省重大科技专项(2014C01408) (2014C01408)

浙江省公益性技术项目(2017C31014) (2017C31014)

计算机工程与科学

OA北大核心CSCDCSTPCD

1007-130X

访问量0
|
下载量0
段落导航相关论文