计算机工程与应用2024,Vol.60Issue(14):74-85,12.DOI:10.3778/j.issn.1002-8331.2306-0248
基于自适应邻域与聚类的非平衡数据特征选择
Feature Selection Using Adaptive Neighborhood and Clustering for Imbalanced Data
摘要
Abstract
To solve the problems that the traditional neighborhood rough sets do not consider the class-distribution of imbalanced data,and it is difficult for most neighborhood systems to find the optimal neighborhood radius through manual debugging and the number of clusters needs to be specified in clustering,a feature selection method for imbalanced data based on adaptive neighborhood and clustering is proposed.Firstly,the adaptive K-nearest neighbors and shared nearest neighbors of samples are determined according to the average distance between the samples and other samples under each feature,and then the hybrid sampling model is designed based on adaptive neighborhood density to develop the balanced decision systems.Secondly,a new neighborhood radius is defined based on the feature distribution,the Gaussian kernel function is used to research the fuzzy similarity relationship between samples in the neighborhood.The fuzzy neighbor-hood mutual information is proposed to measure the correlation between features,and features are clustered based on this.Finally,the particle swarm initialization strategy is designed based on fuzzy neighborhood mutual information.To improve the integer particle swarm optimization algorithm,the dynamic bit mask strategy and the differential perturbation operator suitable for integer coding are introduced,and the representative features are selected from the feature cluster to form the final feature subset.The experimental results on 19 imbalanced datasets show that the developed algorithm can effectively improve the classification effect of imbalanced data.关键词
自适应邻域/混合采样/模糊邻域互信息/特征聚类/特征选择Key words
adaptive neighborhood/hybrid sampling/fuzzy neighborhood mutual information/feature clustering/feature selection分类
信息技术与安全科学引用本文复制引用
孙林,梁娜,王欣雅..基于自适应邻域与聚类的非平衡数据特征选择[J].计算机工程与应用,2024,60(14):74-85,12.基金项目
国家自然科学基金(62076089,61772176). (62076089,61772176)