计算机应用研究2016,Vol.33Issue(10):2997-3000,4.DOI:10.3969/j.issn.1001-3695.2016.10.029
基于数据密度分布的欠采样方法研究
Under-sampling technique based on data density distribution
摘要
Abstract
Aiming at the problems of the identification precision of traditional classifier was low for the minority class sample and the traditional under-sampling method was easy to lose information,the paper proposed an under-sampling method US-DD based on data density distribution,which could divide data into high density data cluster and low density data cluster.The two kinds of data were not only different on the sample quantity,but also were different on the influence of data classification. Therefore,it could divide the data set using the data density,performed different re-sampling strategy for the data cluster with different density,and achieved the purpose of improving data balance.Through selecting six UCI data sets,the experiment re-sults show that the US-DD method is effective for imbalanced data classification,and can effectively improve the recognition performance of the classifier for the minority class by comparing with the method of the random under-sampling and KNN-NearMiss.关键词
不平衡数据/数据密度/欠采样/分布Key words
imbalanced data/data density/under-sampling/distribution分类
信息技术与安全科学引用本文复制引用
杨杰明,闫欣,曲朝阳,宋晨晨,乔媛媛..基于数据密度分布的欠采样方法研究[J].计算机应用研究,2016,33(10):2997-3000,4.基金项目
吉林省科技发展计划资助项目(20140204071GX);国家自然科学基金资助项目 ()