南京大学学报(自然科学版)2023,Vol.59Issue(6):928-936,9.DOI:10.13232/j.cnki.jnju.2023.06.003
面向高维小样本数据的层次子空间ReliefF特征选择算法
Hierarchical subspace ReliefF feature selection algorithm for high-dimensional small sample data
摘要
Abstract
High-dimensional small sample data has much higher feature dimensions than the number of samples,which usually contains a large number of redundant features.ReliefF algorithm has the following challenges when dealing with such data.Most of the existing improved ReliefF algorithms eliminate redundant features by calculating the mutual information between features,which is not applicable to high-dimensional data.Classifying by intercepting a number of features with the highest relevance to the label may not be the optimal choice because it does not consider the impact of different feature combinations on the classification performance.In this paper,we propose a ReliefF feature selection algorithm based on hierarchical subspaces,which divides the original feature set into subspaces with hierarchical structure and calculates the local dependencies of the lower subspaces by using the neighborhood rough set theory,which eliminates redundant features in batch with high efficiency on high-dimensional small sample data.In addition,in order to consider the influence of different feature combinations on the results,the concept of"local leadership"is introduced,and the features with stronger"leading"ability in some subspaces are retained to give a more objective evaluation of the features from both local and global perspectives.Experiments on six microarray gene datasets show that the proposed method is more efficient than existing methods and maintains good classification performance.关键词
高维小样本数据/特征选择/ReliefF/层次子空间/邻域粗糙集Key words
high-dimensional small sample data/feature selection/ReliefF/hierarchical subspace/neighborhood rough set分类
计算机与自动化引用本文复制引用
程凤伟,王文剑,张珍珍..面向高维小样本数据的层次子空间ReliefF特征选择算法[J].南京大学学报(自然科学版),2023,59(6):928-936,9.基金项目
国家自然科学基金(62076154,U1805263),中央引导地方科技发展资金(YDZX20201400001224),山西省自然科学基金(201901D111030),山西省教育科学"十四五"规划项目(GH21395) (62076154,U1805263)