统计与决策2024,Vol.40Issue(22):53-58,6.DOI:10.13546/j.cnki.tjyjc.2024.22.009
基于Bagging集成的高维不平衡数据特征选择方法
Bagging Ensemble-based Feature Selection Method for High-dimensional Imbalanced Data
摘要
Abstract
With the development of big data,samples in many application areas are presented in high-dimensional forms,and the high-dimensional characteristics of datasets will attenuate the classification effect of imbalanced learning.Aiming at the classification of high-dimensional imbalanced data,this paper proposes an adaptive feature selection method WAFS based on SVM-RFE and Bagging ensemble,which combines embedded and wrapper feature selection methods to adaptively select the opti-mal features to form feature space.Through 5 high-dimensional imbalanced public datasets with different dimensions(100~25000),WAFS is compared with the filter-based CSS feature selection algorithm and the embedded ASG feature selection algo-rithm.Also,the optimal sampling method for different datasets and the optimal rate of feature space of datasets with different di-mension are explored.AUC,Acc,Recall,F1-score and G-mean is taken as the evaluation indicators,and the experiment is con-ducted to show that the WAFS algorithm has good performance on datasets with different dimensions,especially in high-dimen-sional and imbalanced datasets with small samples,and that the model has strong stability and generalization under the premise of ensuring accuracy.关键词
自适应/特征选择/Bagging集成/高维不平衡Key words
self-adaptation/feature selection/Bagging ensemble/high-dimensional imbalance分类
管理科学引用本文复制引用
王劲波,刘礼..基于Bagging集成的高维不平衡数据特征选择方法[J].统计与决策,2024,40(22):53-58,6.基金项目
国家社会科学基金一般项目(22BTJ006) (22BTJ006)