北华大学学报(自然科学版)2024,Vol.25Issue(5):694-700,7.DOI:10.11713/j.issn.1009-4822.2024.05.023
基于新型采样技术的非平衡数据分类方法
Classification Method for Imbalanced Data Based on Novel Sampling Technique
摘要
Abstract
In some actual scenes,data imbalance is a common problem that significantly affects prediction results of models.Synthetic Minority Over-Sampling Technique is a method for addressing the problem of imbalanced classification,but it has limitations.Aiming at the problem of class imbalance in data,an improved random forest classification algorithm using SMOTE based on data distribution and cluster weighting is proposed.The algorithm acquires distribution information from samples,divides minority class samples into various clusters,and assigns different synthetic shares to each region according to the information ratios of the clusters.Minority class samples are combined with their weights to generate target samples of the corresponding scales.The data is trained through learning and evaluation based on random forest.Simulation tests on ten sets of imbalanced datasets demonstrate that DCSMOTE-RF achieves better prediction performance on imbalanced data.关键词
非平衡分类/合成少数类过采样技术/随机森林/聚类Key words
imbalanced classification/synthetic minority over-sampling technique/random forest/clustering分类
信息技术与安全科学引用本文复制引用
刘子桐,刘振远,庞娜,马铭..基于新型采样技术的非平衡数据分类方法[J].北华大学学报(自然科学版),2024,25(5):694-700,7.基金项目
国家自然科学基金项目(42004153) (42004153)
北华大学研究生创新计划项目(2022007). (2022007)