|国家科技期刊平台
首页|期刊导航|计算机工程与应用|基于自适应邻域与聚类的非平衡数据特征选择

基于自适应邻域与聚类的非平衡数据特征选择OA北大核心CSTPCD

Feature Selection Using Adaptive Neighborhood and Clustering for Imbalanced Data

中文摘要英文摘要

为了解决传统邻域粗糙集未考虑不平衡数据的类分布,多数邻域系统通过人工调试难以找到最佳邻域半径,以及聚类时指定簇的数目等问题,提出一种基于自适应邻域与聚类的非平衡数据特征选择方法.根据样本在各个特征下与其他样本距离的平均值来确定样本的自适应k近邻和共享近邻,定义自适应邻域密度并设计混合采样模型,构建平衡决策系统.基于特征分布定义新的邻域半径,使用高斯核函数研究邻域内样本之间的模糊相似关系,使用模糊邻域互信息度量特征间的相关性,基于此对特征进行聚类.基于模糊邻域互信息构造粒子群初始化策略,并引入动态位掩码策略与适合整数编码的差异性扰动算子,改进整型粒子群优化算法,实现从特征簇中选出代表性特征构成最终的特征子集.在19个非平衡数据集的实验结果表明所设计的算法有效地提高了非平衡数据的分类性能.

To solve the problems that the traditional neighborhood rough sets do not consider the class-distribution of imbalanced data,and it is difficult for most neighborhood systems to find the optimal neighborhood radius through manual debugging and the number of clusters needs to be specified in clustering,a feature selection method for imbalanced data based on adaptive neighborhood and clustering is proposed.Firstly,the adaptive K-nearest neighbors and shared nearest neighbors of samples are determined according to the average distance between the samples and other samples under each feature,and then the hybrid sampling model is designed based on adaptive neighborhood density to develop the balanced decision systems.Secondly,a new neighborhood radius is defined based on the feature distribution,the Gaussian kernel function is used to research the fuzzy similarity relationship between samples in the neighborhood.The fuzzy neighbor-hood mutual information is proposed to measure the correlation between features,and features are clustered based on this.Finally,the particle swarm initialization strategy is designed based on fuzzy neighborhood mutual information.To improve the integer particle swarm optimization algorithm,the dynamic bit mask strategy and the differential perturbation operator suitable for integer coding are introduced,and the representative features are selected from the feature cluster to form the final feature subset.The experimental results on 19 imbalanced datasets show that the developed algorithm can effectively improve the classification effect of imbalanced data.

孙林;梁娜;王欣雅

天津科技大学 人工智能学院,天津 300457河南师范大学计算机与信息工程学院,河南新乡 453007河南中豫建设投资集团股份有限公司,郑州 450000

计算机与自动化

自适应邻域混合采样模糊邻域互信息特征聚类特征选择

adaptive neighborhoodhybrid samplingfuzzy neighborhood mutual informationfeature clusteringfeature selection

《计算机工程与应用》 2024 (014)

74-85 / 12

国家自然科学基金(62076089,61772176).

10.3778/j.issn.1002-8331.2306-0248

评论