计算机应用研究2024,Vol.41Issue(1):126-133,8.DOI:10.19734/j.issn.1001-3695.2023.05.0196
基于Spark和NRSCA策略的并行深度森林算法
Parallel deep forest algorithm based on Spark and NRSCA strategy
摘要
Abstract
Aiming to address several issues encountered by parallel deep forest algorithms in big data environments,such as excessive redundancy and irrelevant features,low utilization rate of features at both ends,slow model convergence speed,and low parallel efficiency of cascading forests,this paper proposed a parallel deep forest algorithm based on Spark and NRSCA strategy(PDF-SNRSCA).Firstly,the algorithm proposed a feature selection strategy(FS-NRS)based on neighborhood rough sets and Fisher score,which measured the correlation and redundancy of features to effectively reduce the number of redundant and irrelevant features.Secondly,it proposed a scanning strategy based on random selection and equidistant extraction(S-RSEE)to ensure that all features were utilized with the same probability and solved the problem of low utilization rate of two ends in multi-granularing scanning.Finally,combining with the Spark framework,the algorithm realized the parallel trai-ning of cascading forests,and it proposed a feature filtering mechanism based on the importance index(FFM-Ⅱ)to balance the dimensions of enhanced class vectors and original class vectors,thereby accelerating the model convergence speed.Mean-while,the algorithm designed a task scheduling mechanism based on SCA(TSM-SCA)to redistribute tasks and ensure load balancing in the cluster,which solved the problem of low parallel efficiency of cascading forests.Experiments show that the PDF-SNRSCA algorithm can effectively improve the classification performance of deep forests and greatly enhance the efficien-cy of parallel training of deep forests.关键词
并行深度森林算法/Spark框架/邻域粗糙集/正弦余弦算法/多粒度扫描Key words
parallel deep forest algorithm/Spark framework/neighborhood rough sets/sine cosine algorithm/multi-granularing scanning分类
信息技术与安全科学引用本文复制引用
毛伊敏,刘绍芬..基于Spark和NRSCA策略的并行深度森林算法[J].计算机应用研究,2024,41(1):126-133,8.基金项目
广东省重点提升项目(2022ZDJS048) (2022ZDJS048)
韶关市科技项目(220607154531533) (220607154531533)
科技创新2030-"新一代人工智能"重大项目(2020AAA0109605) (2020AAA0109605)