| 注册
首页|期刊导航|计算机应用与软件|基于高斯混合聚类采样的不平衡数据处理方法

基于高斯混合聚类采样的不平衡数据处理方法

严涛 江开忠 姜新盈 王舒梵

计算机应用与软件2023,Vol.40Issue(12):305-311,7.
计算机应用与软件2023,Vol.40Issue(12):305-311,7.DOI:10.3969/j.issn.1000-386x.2023.12.045

基于高斯混合聚类采样的不平衡数据处理方法

UNBALANCED DATA PROCESSING METHOD BASED ON GAUSSIAN MIXTURE CLUSTERING

严涛 1江开忠 1姜新盈 1王舒梵1

作者信息

  • 1. 上海工程技术大学数理统计学院 上海 201620
  • 折叠

摘要

Abstract

In order to effectively eliminate the redundant information in most samples and synthesize a few valuable samples when processing unbalanced data,we propose a sampling algorithm based on Gaussian mixture model(MSGMM).MSGMM clustered the majority samples and minority samples respectively,and the optimal number of clusters was determined by iteration.In the iteration,the number of clusters was initially selected and the Gaussian mixture model was used to cluster.For most samples,the rejection ratio of each cluster C was the distance from its cluster center to the hyperplane generated by SVM and the weight of its number.For minority samples,the sampling proportion was divided according to the distance between the cluster center and the hyperplane.The Random-SMOTE algorithm was used to synthesize new samples to achieve the balance between the samples.The experimental results show that the accuracy of MSGMMis improved by 1%-16%compared with those of traditional algorithms,which verifies the feasibility of the proposed algorithm.

关键词

不平衡数据集/分类/高斯混合模型/混合采样

Key words

Unbalanced data set/Classification/Gaussian mixture model/Mixed sampling

分类

信息技术与安全科学

引用本文复制引用

严涛,江开忠,姜新盈,王舒梵..基于高斯混合聚类采样的不平衡数据处理方法[J].计算机应用与软件,2023,40(12):305-311,7.

计算机应用与软件

OA北大核心CSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文