基于闭合频繁模式的半随机森林数据流分类算法OACSTPCD
Semi-Random Forest Classification Based on Closed Frequent Pattern for Data Streams
提出了一种基于闭合频繁模式的半随机森林数据流分类算法(Semi-Random Forest based on Closed Frequent Pattern,SRFCFP),以解决数据流中噪声和概念漂移问题.SRFCFP利用闭合频繁模式对数据流进行表示,去除冗余信息和噪声,突出数据特征.采用半随机森林建立分类模型,并通过基于时间衰减的模式集更新机制适应数据流的无限性.为了检测概念漂移并及时适应,引入了一种模式集差异性度量方式,用于测量数据分布变化.实验结果表明,在 MOA平台下使用真实和合成数据集,SRFCFP在平均精度上超越了相关对比算法,并能有效处理数据流中的概念漂移和噪声问题.
To solve the issues of noise and concept drift exists in the data stream,a Semi-Random Forest Classification based on Closed Frequent Pattern(SRFCFP)for Data Streams algorithm was proposed.SRFCFP used the closed frequent patterns to represent the input data stream to remove redundant information and noise and highlight the characteristics of data.Semi-random forests were used to construct the classifier after representation,and a pattern set updating mechanism based on time decay model was proposed for the continuous data stream.Meanwhile,in order to detect and adapt to concept drift in time,a difference measurement method for pattern set was proposed,which used the mined patterns to measure distribution changes.The experiments were performed under the MOA using real-world datasets and synthetic datasets,respectively.The results showed that the proposed method can outperform the related comparison algorithm in average accuracy,and can effectively deal with the concept drift and noise.
孙艳歌;邵罕;蒋明毅
信阳师范大学 计算机与信息技术学院,河南 信阳 464000信阳师范大学 计算机与信息技术学院,河南 信阳 464000信阳师范大学 计算机与信息技术学院,河南 信阳 464000
计算机与自动化
数据流闭合频繁模式半随机森林概念漂移噪声
data streamclosed frequent patternsemi-random forestconcept driftnoise
《信阳师范学院学报(自然科学版)》 2024 (4)
442-448,7
国家自然科学基金项目(61702550)河南省研究生质量工程项目(YJS2023SZ23)
评论