计算机技术与发展2016,Vol.26Issue(9):192-196,5.DOI:10.3969/j.issn.1673-629X.2016.09.043
VFDT算法基于Storm平台的实现方案
Implementation Scheme of VFDT Algorithm on Storm Platform
摘要
Abstract
In order to improve the classification efficiency of the stream data, studies how to implement VFDT algorithm on Storm, a stream data processing platform. A scheme of distributed parallel implementing of VFDT algorithm based on Storm platform is designed. The VFDT algorithm is divided into three modules including building tree module, classification module and evaluation module. The building tree module is responsible for the initializing and incremental building of decision tree,and the classification module for classif-ying the samples,and the evaluation module for evaluating the VFDT decision tree using the labeled samples. The functions of each mod-ule are realized by correctly designing the Spout/Bolt of Storm Topology,and the parallelization of the classification module is implemen-ted by deploying multiple tasks for Classification Bolt. The memory database Redis is used to realize the effective connection of the three modules and the preservation of the decision tree. The message middleware Kafka is used to improve the tolerance of burst stream data. The results of implementing and testing VFDT algorithm based on the proposed scheme show that the classification efficiency of VFDT algorithm under the Storm cluster environment is significantly improved compared with that under the single machine environment,and the classification efficiency can be further improved by reasonably setting the task number in Classification Bolt.关键词
流数据/快速决策树算法/分布式/并行化/StormKey words
stream data/Very Fast Decision Tree ( VFDT)/distribution/parallelization/Storm分类
信息技术与安全科学引用本文复制引用
张发扬,李玲娟,陈煜..VFDT算法基于Storm平台的实现方案[J].计算机技术与发展,2016,26(9):192-196,5.基金项目
国家自然科学基金资助项目(61302158,61571238) (61302158,61571238)
中兴通讯产学研项目 ()