首页|期刊导航|计算机应用研究|基于Spark和NRSCA策略的并行深度森林算法

基于Spark和NRSCA策略的并行深度森林算法

毛伊敏刘绍芬

计算机应用研究2024，Vol.41Issue(1)：126-133,8.

计算机应用研究2024，Vol.41Issue(1)：126-133,8.DOI:10.19734/j.issn.1001-3695.2023.05.0196

基于Spark和NRSCA策略的并行深度森林算法

Parallel deep forest algorithm based on Spark and NRSCA strategy

毛伊敏 ¹刘绍芬²

作者信息

1. 江西理工大学信息工程学院,江西赣州 341000||韶关学院信息工程学院,广东韶关 512026
2. 江西理工大学信息工程学院,江西赣州 341000
折叠

摘要

Abstract

Aiming to address several issues encountered by parallel deep forest algorithms in big data environments,such as excessive redundancy and irrelevant features,low utilization rate of features at both ends,slow model convergence speed,and low parallel efficiency of cascading forests,this paper proposed a parallel deep forest algorithm based on Spark and NRSCA strategy(PDF-SNRSCA).Firstly,the algorithm proposed a feature selection strategy(FS-NRS)based on neighborhood rough sets and Fisher score,which measured the correlation and redundancy of features to effectively reduce the number of redundant and irrelevant features.Secondly,it proposed a scanning strategy based on random selection and equidistant extraction(S-RSEE)to ensure that all features were utilized with the same probability and solved the problem of low utilization rate of two ends in multi-granularing scanning.Finally,combining with the Spark framework,the algorithm realized the parallel trai-ning of cascading forests,and it proposed a feature filtering mechanism based on the importance index(FFM-Ⅱ)to balance the dimensions of enhanced class vectors and original class vectors,thereby accelerating the model convergence speed.Mean-while,the algorithm designed a task scheduling mechanism based on SCA(TSM-SCA)to redistribute tasks and ensure load balancing in the cluster,which solved the problem of low parallel efficiency of cascading forests.Experiments show that the PDF-SNRSCA algorithm can effectively improve the classification performance of deep forests and greatly enhance the efficien-cy of parallel training of deep forests.

关键词

并行深度森林算法/Spark框架/邻域粗糙集/正弦余弦算法/多粒度扫描

Key words

parallel deep forest algorithm/Spark framework/neighborhood rough sets/sine cosine algorithm/multi-granularing scanning

分类

信息技术与安全科学

引用本文复制引用

毛伊敏,刘绍芬..基于Spark和NRSCA策略的并行深度森林算法[J].计算机应用研究,2024,41(1):126-133,8.

基金项目

广东省重点提升项目(2022ZDJS048) （2022ZDJS048）

韶关市科技项目(220607154531533) （220607154531533）

科技创新2030-"新一代人工智能"重大项目(2020AAA0109605) （2020AAA0109605）

计算机应用研究

OA北大核心CSTPCD

ISSN：1001-3695

访问量0

下载量0

段落导航