| 注册
首页|期刊导航|计算机应用与软件|基于统计抽样的非均衡分类方法在软件缺陷预测中的应用

基于统计抽样的非均衡分类方法在软件缺陷预测中的应用

徐可欣 张文 王永吉

计算机应用与软件Issue(8):215-219,233,6.
计算机应用与软件Issue(8):215-219,233,6.DOI:10.3969/j.issn.1000-386x.2015.08.052

基于统计抽样的非均衡分类方法在软件缺陷预测中的应用

APPLYING STATISTICAL SAMPLING-BASED IMBALANCED CLASSIFICATION IN SOFTWARE DEFECT PREDICTION

徐可欣 1张文 2王永吉1

作者信息

  • 1. 中国科学院软件研究所基础软件国家工程研究中心 北京100190
  • 2. 中国科学院大学 北京100190
  • 折叠

摘要

Abstract

Currently the researches of software defect prediction ( SDP) are mainly conducted in two aspects of source acquisition from his-torical data and prediction methods.Unfortunately, the data of historical software defects we got are basically class imbalanced, traditional prediction methods will result in high misclassification of the defects data.To solve this problem, we propose to use an imbalanced classifica-tion method based on statistical sampling for software defect prediction.By comparing and analysing empirically the pros and cons in predic-tion performances of 12 combined algorithms consisting of ready samples and classifications, we derive that the SP-RF ( SpreadSubsampling combining with random forest) method shows the best overall performance, but a little weakness in false positive ratio ( FPR) .To further improve the prediction performance of the algorithm, as well as to address the deficiencies of primitive SP-RF method in bringing forth the bigger noise and information missing to original data, we propose an SP-RF-based adaptive random forest algorithm with inner-balanced sampling ( IBSBA-RF) .It is demonstrated by the experiment that the IBSBA-RF algorithm can noticeably reduce the FPR of predication result, and further increases the AUC and Balance measure of the prediction result as well.

关键词

软件缺陷预测/非均衡/抽样/随机森林/代价敏感

Key words

Software defect prediction/Imbalance/Sampling/Random forest/Cost-sensitive

分类

信息技术与安全科学

引用本文复制引用

徐可欣,张文,王永吉..基于统计抽样的非均衡分类方法在软件缺陷预测中的应用[J].计算机应用与软件,2015,(8):215-219,233,6.

基金项目

国家自然科学基金项目(71101138,61379046,91318301);北京市自然科学基金项目(4122087);国家科技重大专项(2012ZX01039-004)。 ()

计算机应用与软件

OACSCDCSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文