| 注册
首页|期刊导航|计算机应用与软件|不平衡数据分类研究及其应用

不平衡数据分类研究及其应用

叶枫 丁锋

计算机应用与软件2018,Vol.35Issue(1):132-136,205,6.
计算机应用与软件2018,Vol.35Issue(1):132-136,205,6.DOI:10.3969/j.issn.1000-386x.2018.01.023

不平衡数据分类研究及其应用

RESEARCH AND APPLICATION OF UNBALANCED DATA CLASSIFICATION

叶枫 1丁锋1

作者信息

  • 1. 浙江工业大学经贸管理学院 浙江杭州310012
  • 折叠

摘要

Abstract

In light of the problem that the traditional machine learning algorithm has low classification accuracy for minority classes of unbalanced data.In this paper,we analyzed the causes of the problem and then proposed an undersampling method to improve the classification accuracy of minority classes.This method uses the k-means algorithm to cluster the samples many times,and removes the noise of most classes,as well as the samples with the highest degree of overlap.At the same time,we introduced the deletion factor λ to avoid the important information loss of majority classes.Through the experimental analysis of the UCI datasets,the traditional classification algorithm improved the Recall rate and the F-measure of minority classes.The result of the work implied that the method could improve the classification accuracy of minority classes.Finally,the method was used for medical application of predicting post-operative life expectancy in the lung cancer patients.The experiment showed the recall rate and F-measure of the lung cancer patients' one-year mortality was increased by 42% and 23%.

关键词

不平衡数据/k-means聚类/召回率

Key words

Unbalanced date set/K-means clustering/Recall

分类

信息技术与安全科学

引用本文复制引用

叶枫,丁锋..不平衡数据分类研究及其应用[J].计算机应用与软件,2018,35(1):132-136,205,6.

计算机应用与软件

OA北大核心CSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文