河北工业科技2024,Vol.41Issue(4):291-298,8.DOI:10.7535/hbgykj.2024yx04007
基于ADASYN和WGAN的混合不平衡数据处理方法
Hybrid imbalanced data processing based on ADASYN and WGAN
摘要
Abstract
In order to solve the problem of low classification accuracy of minority class samples in imbalanced datasets,an ADASYN-WGAN method was proposed to deal with imbalanced datasets.Firstly,the minority class samples were generated using the ADASYN algorithm,and these generated samples were used to replace the random noise in the WGAN;Secondly,the minority class samples conforming to the distribution law of the original dataset were generated using the WGAN algorithm to construct the balanced dataset;Then,the processing results derived from the proposed method and the four over-sampling algorithms were compared with the original dataset using the random forest classifier on six public datasets,respectively.Finally,the effectiveness of the proposed method was verified by the performance of classification assessment indexes such as F1-Score,G-mean and AUC.The results show that in the comparison experiments,the balanced dataset obtained by the ADASYN-WGAN method achieves the optimal values of all classification assessment indexes in four public datasets in the ten-fold cross-validation of the random forest classifier,and the F1-Score and G-mean achieve the highest values in the other two public datasets,although the AUC values are slightly lower.The proposed ADASYN-WGAN method can generate high-quality data samples and provide reference for solving the problem of prediction bias for a few class samples in unbalanced datasets.关键词
数据处理/不平衡数据/WGAN/ADASYN/过采样方法/随机森林Key words
data processing/imbalanced data/WGAN/ADASYN/oversampling method/random forest分类
信息技术与安全科学引用本文复制引用
周万珍,盛媛媛,张永强,马金龙..基于ADASYN和WGAN的混合不平衡数据处理方法[J].河北工业科技,2024,41(4):291-298,8.基金项目
河北省自然科学基金(F2022208002) (F2022208002)
河北省高等学校科学技术研究重点项目(ZD2021048) (ZD2021048)