计算机应用研究Issue(2):379-381,418,4.DOI:10.3969/j.issn.1001-3695.2015.02.014
基于混合采样的非平衡数据集分类研究
Classification research for unbalanced data based on mixed-sampling
摘要
Abstract
To solve the problem that traditional over-sampling algorithms may cause the decision-making domain becomes smaller and the noise point increases while sample was being increased,this paper presented a mixed-sampling algorithm based on misclassified samples.This approach used support vector machine be as base classifier and the misclassified samples be identified during each iteration,according to their spatial relationship between neighbors of each misclassified samples,it took an improved mixed-sampling strategy:remove this directly to the noise samples and exclude positive class samples in the neigh-bors to the dangerous samples,while,to security samples,compose new samples by SMOTE algorithm,then added to the original training set to retrain the classification model.Compared with SMOTE-SVM algorithm and AdaBoost-SVM-OBMS algorithm on real data sets,the experimental results finally show that this algorithm can effectively improve the classification accuracy of the negative class.关键词
混合采样/错分样本/非平衡数据集/AdaBoost算法/支持向量机算法Key words
mixed-sampling/misclassified samples/unbalanced data/AdaBoost algorithm/SVM algorithm分类
信息技术与安全科学引用本文复制引用
古平,欧阳源遊..基于混合采样的非平衡数据集分类研究[J].计算机应用研究,2015,(2):379-381,418,4.基金项目
重庆市自然科学基金资助项目 ()