| 注册
首页|期刊导航|计算机工程与应用|考虑类内不平衡的谱聚类过抽样方法

考虑类内不平衡的谱聚类过抽样方法

骆自超 金隼 邱雪峰

计算机工程与应用Issue(11):120-125,138,7.
计算机工程与应用Issue(11):120-125,138,7.DOI:10.3778/j.issn.1002-8331.1312-0148

考虑类内不平衡的谱聚类过抽样方法

Spectral clustering based oversampling:oversampling taking within class ;imbalance into consideration

骆自超 1金隼 1邱雪峰1

作者信息

  • 1. 上海交通大学 机械与动力工程学院,上海 200000
  • 折叠

摘要

Abstract

Imbalanced datasets are one of the most crucial challenges encountered by data mining techniques. Oversam-pling has been proven to be a very effective method in dealing with imbalanced datasets. However, traditional oversam-pling methods pay no attention to within class imbalance which is pervasive in real world datasets. To resolve this prob-lem, this paper proposes an oversampling method based on modified spectral clustering. This method first automatically decides the best number of clusters. Then modified spectral clustering is applied to minority samples. Based on the num-ber of samples contained in each cluster, this proposal judges the number of samples which shall be generated inside each cluster to get a dataset which is balanced both between and within class. This method is tested in 4 real world datasets and one simulated dataset. It is proven to be effective. Moreover, a comparison between traditional k-means clustering based oversampling and the method proposed in this paper is conducted. The results are analyzed and explained.

关键词

谱聚类/不平衡数据集/过抽样

Key words

spectral clustering/imbalanced dataset/oversampling

分类

信息技术与安全科学

引用本文复制引用

骆自超,金隼,邱雪峰..考虑类内不平衡的谱聚类过抽样方法[J].计算机工程与应用,2014,(11):120-125,138,7.

基金项目

国家十二五科技支撑计划(No.2012BAF06B03);国家自然基金(No.51175340)。 ()

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文