首页|期刊导航|计算机与数字工程|基于Flink框架的K-means算法优化及并行计算策略

基于Flink框架的K-means算法优化及并行计算策略

李召鑫孟祥印肖世德胡锴沣赖焕杰

计算机与数字工程2023，Vol.51Issue(10)：2231-2235,5.

计算机与数字工程2023，Vol.51Issue(10)：2231-2235,5.DOI:10.3969/j.issn.1672-9722.2023.10.003

基于Flink框架的K-means算法优化及并行计算策略

K-means Algorithm Optimization and Parallel Computing Strategy Based on Flink Framework

李召鑫 ¹孟祥印 ¹肖世德 ¹胡锴沣 ¹赖焕杰¹

作者信息

1. 西南交通大学机械工程学院成都 610031
折叠

摘要

Abstract

K-means algorithm is widely used in the field of machine learning and data mining because of its simple principle and good clustering effect,but it still has some shortcomings:K-means algorithm needs to specify the number of classification cate-gories K.K-means algorithm selection strategy for the initial clustering center is random selection,which may affect the accuracy and calculation speed of the final clustering results.The above shortcomings all limit the improvement of the calculation efficiency of the K-means algorithm.To solve the above problems,this paper proposes a K-means optimization algorithm based on Flink parallel-ization.This algorithm introduces the Canopy algorithm on the basis of the traditional K-means algorithm to complete the initial clus-tering,and obtains the number of categories K,and then uses the maximum distance algorithm to calculate the initial clustering cen-ter,and uses the parallel computing power of the Flink framework to perform clustering experiments on multiple data sets.The ex-perimental results show that the algorithm in this paper can reduce the number of iterations of the clustering process,and also has a certain improvement in the accuracy of clustering.It also has good computational efficiency in the environment of large-scale data sets.

关键词

Flink/K-means算法/Canopy算法/并行化

Key words

Flink/K-means algorithm/Canopy algorithm/parallel

分类

信息技术与安全科学

引用本文复制引用

李召鑫,孟祥印,肖世德,胡锴沣,赖焕杰..基于Flink框架的K-means算法优化及并行计算策略[J].计算机与数字工程,2023,51(10):2231-2235,5.

计算机与数字工程

OACSTPCD

ISSN：1672-9722

访问量3

下载量0

段落导航