计算机应用与软件2018,Vol.35Issue(1):132-136,205,6.DOI:10.3969/j.issn.1000-386x.2018.01.023
不平衡数据分类研究及其应用
RESEARCH AND APPLICATION OF UNBALANCED DATA CLASSIFICATION
叶枫 1丁锋1
作者信息
- 1. 浙江工业大学经贸管理学院 浙江杭州310012
- 折叠
摘要
Abstract
In light of the problem that the traditional machine learning algorithm has low classification accuracy for minority classes of unbalanced data.In this paper,we analyzed the causes of the problem and then proposed an undersampling method to improve the classification accuracy of minority classes.This method uses the k-means algorithm to cluster the samples many times,and removes the noise of most classes,as well as the samples with the highest degree of overlap.At the same time,we introduced the deletion factor λ to avoid the important information loss of majority classes.Through the experimental analysis of the UCI datasets,the traditional classification algorithm improved the Recall rate and the F-measure of minority classes.The result of the work implied that the method could improve the classification accuracy of minority classes.Finally,the method was used for medical application of predicting post-operative life expectancy in the lung cancer patients.The experiment showed the recall rate and F-measure of the lung cancer patients' one-year mortality was increased by 42% and 23%.关键词
不平衡数据/k-means聚类/召回率Key words
Unbalanced date set/K-means clustering/Recall分类
信息技术与安全科学引用本文复制引用
叶枫,丁锋..不平衡数据分类研究及其应用[J].计算机应用与软件,2018,35(1):132-136,205,6.