计算机与数字工程2024,Vol.52Issue(3):688-691,704,5.DOI:10.3969/j.issn.1672-9722.2024.03.009
基于Spark平台的分类算法性能比较分析
Performance Comparison and Analysis of Classification Algorithms Based on Spark Platform
摘要
Abstract
In view of the rapid development of big data and machine learning technology,MLlib machine learning library based on Spark platform is used to implement feedforward artificial neural network,support vector machine and random forest,three machine learning algorithms,the operation and classification performance of the three algorithms under the big data platform are analyzed and evaluated.The experimental results show that with the increase of the number of nodes,the time consumed by the three algorithms on the big data platform gradually decreases.When the dataset is less than 100MB,the acceleration ratio of neural network and support vector machine algorithm is higher,and when the dataset is larger than 1GB,the acceleration ratio of random forest algorithm is better than the other two algorithms.The neural network algorithm has the least scalability when the data set is 100MB,and the support vector machine algorithm has the least scalability when the data set is 500MB.The random forest algorithm has better scale growth than the other two algorithms when the data set is larger than 1GB.By comparing the time efficiency and ac-curacy of the three classification algorithms,the SVM algorithm consumes the least time,but the classification accuracy is the low-est.Neural network algorithm consumes the longest time,and the classification accuracy is lower than random forest algorithm.Ran-dom forest algorithm has the highest classification accuracy,but its running time is higher than support vector machine algorithm.The integrated classification algorithm shows better time performance and classification accuracy on the big data platform.关键词
大数据/Hadoop框架/Spark框架/机器学习/性能评估Key words
big data/Hadoop framework/Spark framework/machine learning/performance evaluation分类
计算机与自动化引用本文复制引用
赵蕾,夏吉安,吴洋,崔辉..基于Spark平台的分类算法性能比较分析[J].计算机与数字工程,2024,52(3):688-691,704,5.基金项目
2020年度中国高校产学研创新基金项目(编号:2020HYB02005) (编号:2020HYB02005)
2022年度江苏省产学研合作项目(编号:BY2022560) (编号:BY2022560)
2020年度江苏省工业软件工程技术研究项目(编号:ZK20-04-12)资助. (编号:ZK20-04-12)