| 注册
首页|期刊导航|计算机应用研究|基于数据密度分布的欠采样方法研究

基于数据密度分布的欠采样方法研究

杨杰明 闫欣 曲朝阳 宋晨晨 乔媛媛

计算机应用研究2016,Vol.33Issue(10):2997-3000,4.
计算机应用研究2016,Vol.33Issue(10):2997-3000,4.DOI:10.3969/j.issn.1001-3695.2016.10.029

基于数据密度分布的欠采样方法研究

Under-sampling technique based on data density distribution

杨杰明 1闫欣 1曲朝阳 1宋晨晨 1乔媛媛1

作者信息

  • 1. 东北电力大学 信息工程学院,吉林 吉林 132012
  • 折叠

摘要

Abstract

Aiming at the problems of the identification precision of traditional classifier was low for the minority class sample and the traditional under-sampling method was easy to lose information,the paper proposed an under-sampling method US-DD based on data density distribution,which could divide data into high density data cluster and low density data cluster.The two kinds of data were not only different on the sample quantity,but also were different on the influence of data classification. Therefore,it could divide the data set using the data density,performed different re-sampling strategy for the data cluster with different density,and achieved the purpose of improving data balance.Through selecting six UCI data sets,the experiment re-sults show that the US-DD method is effective for imbalanced data classification,and can effectively improve the recognition performance of the classifier for the minority class by comparing with the method of the random under-sampling and KNN-NearMiss.

关键词

不平衡数据/数据密度/欠采样/分布

Key words

imbalanced data/data density/under-sampling/distribution

分类

信息技术与安全科学

引用本文复制引用

杨杰明,闫欣,曲朝阳,宋晨晨,乔媛媛..基于数据密度分布的欠采样方法研究[J].计算机应用研究,2016,33(10):2997-3000,4.

基金项目

吉林省科技发展计划资助项目(20140204071GX);国家自然科学基金资助项目 ()

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量6
|
下载量0
段落导航相关论文