南京航空航天大学学报2013,Vol.45Issue(4):550-555,6.
KNN分类算法的MapReduce并行化实现
Parallel Implementing KNN Classification Algorithm Using MapReduce Programming Mode
摘要
Abstract
In order to improve the ability of KNN algorithm to process massive data,a new technique based on Hadoop platform is used.Considering the characteristics of the KNN algorithm itself,the parallelism of KNN based on the MapReduce programming model is implemented.Three functions are designed for the implementation of the parallelism,named Map,Combine and Reduce.The Similarity between each test instances and the training dataset are evaluated by Map function.For reducing the computational complexity and saving network bandwidth,the Combine function is used as a local Reduce operation.Reduce function is used to get the KNN classification based on the intermediate results.The experiment on the Hadoop platform shows the method has excellent linear speedup with an increasing number of computer nodes and good scalability.关键词
KNN分类/并行计算/MapReduce模型/HadoopKey words
KNN classification/ parallel computing/ MapReduce programming model/ Hadoop分类
信息技术与安全科学引用本文复制引用
闫永刚,马廷淮,王建..KNN分类算法的MapReduce并行化实现[J].南京航空航天大学学报,2013,45(4):550-555,6.基金项目
国家自然科学基金(61173143)资助项目 (61173143)
江苏省自然科学基金(BK2010380)资助项目 (BK2010380)
中国博士后科学基金(2012M511303)资助项目 (2012M511303)
江苏省高校优势学科建设工程资助项目. ()