计算机与数字工程2017,Vol.45Issue(9):1802-1808,7.DOI:10.3969/j.issn.1672-9722.2017.09.024
基于改进随机决策树算法的分布式数据挖掘
Distributed Data Mining Based on Improved Random Decision Tree Algorithm
摘要
Abstract
Based on the random decision tree algorithm,the probability of single tree and multiple trees is analyzed in this pa?per,and the unregulated local sensitive hash function(LSH)is used to deal with large data sensitive. Classification,in the process of distributed data mining,the use of ultra-planar hash to reduce the super-plane of the possible space and increase the coefficient processing intensive data types,combined with SimHash indirect generation of random vector,FastHash integer mapping to the bit?map processing sparse data types. Finally,the simulation results of running eight small data sets and six large data sets on the Spark platform show that the improved algorithm does not need to construct many depth trees to verify that the improved algorithm runs on a cluster that configures different numbers of nodes.关键词
分布式数据/数据挖掘/决策树算法/哈希函数Key words
distributed data/data mining/decision tree algorithm/hash function分类
信息技术与安全科学引用本文复制引用
石红姣..基于改进随机决策树算法的分布式数据挖掘[J].计算机与数字工程,2017,45(9):1802-1808,7.基金项目
国家自然科学基金项目(编号:61372003)资助. (编号:61372003)