计算机应用与软件2023,Vol.40Issue(12):305-311,7.DOI:10.3969/j.issn.1000-386x.2023.12.045
基于高斯混合聚类采样的不平衡数据处理方法
UNBALANCED DATA PROCESSING METHOD BASED ON GAUSSIAN MIXTURE CLUSTERING
严涛 1江开忠 1姜新盈 1王舒梵1
作者信息
- 1. 上海工程技术大学数理统计学院 上海 201620
- 折叠
摘要
Abstract
In order to effectively eliminate the redundant information in most samples and synthesize a few valuable samples when processing unbalanced data,we propose a sampling algorithm based on Gaussian mixture model(MSGMM).MSGMM clustered the majority samples and minority samples respectively,and the optimal number of clusters was determined by iteration.In the iteration,the number of clusters was initially selected and the Gaussian mixture model was used to cluster.For most samples,the rejection ratio of each cluster C was the distance from its cluster center to the hyperplane generated by SVM and the weight of its number.For minority samples,the sampling proportion was divided according to the distance between the cluster center and the hyperplane.The Random-SMOTE algorithm was used to synthesize new samples to achieve the balance between the samples.The experimental results show that the accuracy of MSGMMis improved by 1%-16%compared with those of traditional algorithms,which verifies the feasibility of the proposed algorithm.关键词
不平衡数据集/分类/高斯混合模型/混合采样Key words
Unbalanced data set/Classification/Gaussian mixture model/Mixed sampling分类
信息技术与安全科学引用本文复制引用
严涛,江开忠,姜新盈,王舒梵..基于高斯混合聚类采样的不平衡数据处理方法[J].计算机应用与软件,2023,40(12):305-311,7.