计算机应用与软件Issue(8):215-219,233,6.DOI:10.3969/j.issn.1000-386x.2015.08.052
基于统计抽样的非均衡分类方法在软件缺陷预测中的应用
APPLYING STATISTICAL SAMPLING-BASED IMBALANCED CLASSIFICATION IN SOFTWARE DEFECT PREDICTION
摘要
Abstract
Currently the researches of software defect prediction ( SDP) are mainly conducted in two aspects of source acquisition from his-torical data and prediction methods.Unfortunately, the data of historical software defects we got are basically class imbalanced, traditional prediction methods will result in high misclassification of the defects data.To solve this problem, we propose to use an imbalanced classifica-tion method based on statistical sampling for software defect prediction.By comparing and analysing empirically the pros and cons in predic-tion performances of 12 combined algorithms consisting of ready samples and classifications, we derive that the SP-RF ( SpreadSubsampling combining with random forest) method shows the best overall performance, but a little weakness in false positive ratio ( FPR) .To further improve the prediction performance of the algorithm, as well as to address the deficiencies of primitive SP-RF method in bringing forth the bigger noise and information missing to original data, we propose an SP-RF-based adaptive random forest algorithm with inner-balanced sampling ( IBSBA-RF) .It is demonstrated by the experiment that the IBSBA-RF algorithm can noticeably reduce the FPR of predication result, and further increases the AUC and Balance measure of the prediction result as well.关键词
软件缺陷预测/非均衡/抽样/随机森林/代价敏感Key words
Software defect prediction/Imbalance/Sampling/Random forest/Cost-sensitive分类
信息技术与安全科学引用本文复制引用
徐可欣,张文,王永吉..基于统计抽样的非均衡分类方法在软件缺陷预测中的应用[J].计算机应用与软件,2015,(8):215-219,233,6.基金项目
国家自然科学基金项目(71101138,61379046,91318301);北京市自然科学基金项目(4122087);国家科技重大专项(2012ZX01039-004)。 ()