首页|期刊导航|计算机工程与应用|基于自适应邻域与聚类的非平衡数据特征选择

基于自适应邻域与聚类的非平衡数据特征选择

孙林梁娜王欣雅

计算机工程与应用2024，Vol.60Issue(14)：74-85,12.

计算机工程与应用2024，Vol.60Issue(14)：74-85,12.DOI:10.3778/j.issn.1002-8331.2306-0248

基于自适应邻域与聚类的非平衡数据特征选择

Feature Selection Using Adaptive Neighborhood and Clustering for Imbalanced Data

孙林 ¹梁娜 ²王欣雅³

作者信息

1. 天津科技大学人工智能学院,天津 300457
2. 河南师范大学计算机与信息工程学院,河南新乡 453007
3. 河南中豫建设投资集团股份有限公司,郑州 450000
折叠

摘要

Abstract

To solve the problems that the traditional neighborhood rough sets do not consider the class-distribution of imbalanced data,and it is difficult for most neighborhood systems to find the optimal neighborhood radius through manual debugging and the number of clusters needs to be specified in clustering,a feature selection method for imbalanced data based on adaptive neighborhood and clustering is proposed.Firstly,the adaptive K-nearest neighbors and shared nearest neighbors of samples are determined according to the average distance between the samples and other samples under each feature,and then the hybrid sampling model is designed based on adaptive neighborhood density to develop the balanced decision systems.Secondly,a new neighborhood radius is defined based on the feature distribution,the Gaussian kernel function is used to research the fuzzy similarity relationship between samples in the neighborhood.The fuzzy neighbor-hood mutual information is proposed to measure the correlation between features,and features are clustered based on this.Finally,the particle swarm initialization strategy is designed based on fuzzy neighborhood mutual information.To improve the integer particle swarm optimization algorithm,the dynamic bit mask strategy and the differential perturbation operator suitable for integer coding are introduced,and the representative features are selected from the feature cluster to form the final feature subset.The experimental results on 19 imbalanced datasets show that the developed algorithm can effectively improve the classification effect of imbalanced data.

关键词

自适应邻域/混合采样/模糊邻域互信息/特征聚类/特征选择

Key words

adaptive neighborhood/hybrid sampling/fuzzy neighborhood mutual information/feature clustering/feature selection

分类

信息技术与安全科学

引用本文复制引用

孙林,梁娜,王欣雅..基于自适应邻域与聚类的非平衡数据特征选择[J].计算机工程与应用,2024,60(14):74-85,12.

基金项目

国家自然科学基金(62076089,61772176). （62076089,61772176）

计算机工程与应用

OA北大核心CSTPCD

ISSN：1002-8331

访问量4

下载量0

段落导航