桂林理工大学学报2017,Vol.37Issue(4):587-593,7.DOI:10.3969/j.issn.1674-9057.2017.04.005
基于PCA-SMOTE-随机森林的地质不平衡数据分类方法——以东天山地球化学数据为例
Classification of imbalance geological data based on PCA-SMOTE algorithm and random forest: a case study of geochemical data from the eastern Tianshan of China
摘要
Abstract
Based on PCA,this paper puts forward a new SMOTE re-sampling algorithm to make datasets equal.It is applied to the classification and prediction of geological data,by the random forest as the classifier.Because the data noise may change distribution pattern after interpolation,this paper combines PCA and SMOTE algorithm to do data denoising and data interpolation to improve the classification performance.This paper makes experiment of geochemical exploration as a sample data,and the results show that the new algorithm can improve the classification accuracy,which provides a new idea to solve the problem of classification and prediction of geological imbalanced data.关键词
主成分分析/SMOTE/随机森林/不平衡数据集/地球化学数据/除噪Key words
principal component analysis (PCA)/SMOTE/random forest/imbalanced datasets/data denoising分类
信息技术与安全科学引用本文复制引用
桂州,陈建国,王成彬..基于PCA-SMOTE-随机森林的地质不平衡数据分类方法——以东天山地球化学数据为例[J].桂林理工大学学报,2017,37(4):587-593,7.基金项目
国家科技支撑计划项目(2011BAB06B08-2) (2011BAB06B08-2)
国家自然科学基金项目(41272361) (41272361)
中国地质调查局项目(1212011120986) (1212011120986)