计算机技术与发展Issue(11):83-86,90,5.DOI:10.3969/j.issn.1673-629X.2014.11.021
基于Hadoop平台的C4.5算法的分析与研究
Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform
摘要
Abstract
How can dig out the valuable information from the vast amount of data in a more rapid,efficient and low-cost way now be-come a new task faced by the data mining technology. In this paper,in the study of the characteristics of the Hadoop platform and the process of decision tree C4. 5 algorithm,decide to introduce the cloud computing thinking to the field of decision tree algorithm,achieve its parallelization on Hadoop platform and use MapReduce model to solve the problem of massive data mining. Finally with using a round of golf data sets to verify this new algorithm,the results of the experiments show that for the huge amounts of data,the decision tree algo-rithm based on Hadoop platform can significantly improve the efficiency of data mining. It has a good efficiency and scalability. In a cer-tain extent,it also solves the problems of computing huge amounts of data and building the decision tree taking long time that C4. 5 algo-rithm faced when dealing with large amount of calculation.关键词
Hadoop/MapReduce/数据挖掘/C4.5算法Key words
Hadoop/MapReduce/data mining/C4. 5 algorithm分类
信息技术与安全科学引用本文复制引用
孙媛,黄刚..基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,(11):83-86,90,5.基金项目
国家自然科学基金资助项目(61171053) (61171053)