首页|期刊导航|计算机应用研究|基于数据密度分布的欠采样方法研究

基于数据密度分布的欠采样方法研究

杨杰明闫欣曲朝阳宋晨晨乔媛媛

计算机应用研究2016，Vol.33Issue(10)：2997-3000,4.

计算机应用研究2016，Vol.33Issue(10)：2997-3000,4.DOI:10.3969/j.issn.1001-3695.2016.10.029

基于数据密度分布的欠采样方法研究

Under-sampling technique based on data density distribution

杨杰明 ¹闫欣 ¹曲朝阳 ¹宋晨晨 ¹乔媛媛¹

作者信息

1. 东北电力大学信息工程学院，吉林吉林 132012
折叠

摘要

Abstract

Aiming at the problems of the identification precision of traditional classifier was low for the minority class sample and the traditional under-sampling method was easy to lose information,the paper proposed an under-sampling method US-DD based on data density distribution,which could divide data into high density data cluster and low density data cluster.The two kinds of data were not only different on the sample quantity,but also were different on the influence of data classification. Therefore,it could divide the data set using the data density,performed different re-sampling strategy for the data cluster with different density,and achieved the purpose of improving data balance.Through selecting six UCI data sets,the experiment re-sults show that the US-DD method is effective for imbalanced data classification,and can effectively improve the recognition performance of the classifier for the minority class by comparing with the method of the random under-sampling and KNN-NearMiss.

关键词

不平衡数据/数据密度/欠采样/分布

Key words

imbalanced data/data density/under-sampling/distribution

分类

信息技术与安全科学

引用本文复制引用

杨杰明,闫欣,曲朝阳,宋晨晨,乔媛媛..基于数据密度分布的欠采样方法研究[J].计算机应用研究,2016,33(10):2997-3000,4.

基金项目

吉林省科技发展计划资助项目（20140204071GX）；国家自然科学基金资助项目（）

计算机应用研究

OA北大核心CSCDCSTPCD

ISSN：1001-3695

访问量6

下载量0

段落导航