计算机技术与发展2025,Vol.35Issue(5):82-89,8.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0007
基于Pareto支配策略的距离度量特征选择算法
Distance Metric Feature Selection Algorithm Based on Pareto Dominance Theory
摘要
Abstract
A distance metric feature selection algorithm(DMPD)based on Pareto dominance strategy is proposed to address the imbalanced characteristics of high-dimensional small sample data and the problems of long training time and low performance in constructing prediction models.The algorithm aims to optimize the construction process of prediction models and effectively reduce com-putational costs.Firstly,it evaluates the correlation between each feature and category using the Fisher Score algorithm,and sorts them by score;On this basis,based on the screening results of different feature dimensions,the best feature dimension is selected through classification performance to complete feature pre selection.Secondly,cosine similarity is used to measure the similarity between features,and the Pareto dominance theory is reasonably applied to remove other features dominated by the highest category correlation feature one by one,effectively removing redundant features and obtaining a streamlined and efficient feature subset.The experimental results show that on six different datasets,the DMPD not only significantly improves classification performance under the same feature set dimension condition,but also performs better than using only Fisher Score or MIC(Mutual Information Coefficient)algorithms.Moreover,compared with FCBF-MIC(Fast Correlation Feature Selection Mutual Information Coefficient)algorithm,DMPD not only improves computational efficiency,but also achieves better classification ability in smaller feature dimensions,proving its effectiveness in solving the problem of imbalanced high-dimensional small sample data.关键词
高维小样本/特征选择/距离度量/帕累托支配理论/最大相关最小冗余Key words
high-dimensional small sample/feature selection/distance metric/Pareto dominance theory/max-relevance and min-redun-dancy分类
信息技术与安全科学引用本文复制引用
罗雅欣,潘晓英,梁家铭,李航凯,王燕..基于Pareto支配策略的距离度量特征选择算法[J].计算机技术与发展,2025,35(5):82-89,8.基金项目
陕西省重点研发计划资助项目(2023-YBSF-476) (2023-YBSF-476)