计算机与数字工程Issue(9):2257-2261,2272,6.DOI:10.3969/j.issn.1672-9722.2019.09.029
基于不平衡数据类分布学习的特征选择方法∗
Feature Selection Method by Label Distribution Learning Based on Imbalanced Data
李克文 1谢鹏 1路慎强2
作者信息
- 1. 中国石油大学(华东)计算机与通信工程学院 青岛 266580
- 2. 中国石化胜利油田分公司物探研究院 东营 257000
- 折叠
摘要
Abstract
In the feature selection process,the traditional feature method may be affected by the imbalance of data set classifi?cation. The paper proposes a feature selection method based on unbalanced data class distribution learning. First,the loss function of the method is transformed from the cumulative relative entropy into a multiplicative relative entropy to introduce an imbalanced data evaluation into the loss function. Then,the new loss function is deformed and the gradient is derived to obtain the gradient di?rection of the loss function,and the loss function converges by the variable step gradient descent method. Finally,the feature is se?lected by threshold control of the learned class distribution to achieve the purpose of feature selection. The paper adopts four kinds of classifiers:Logistic Regression,Random Forest,Support Vector Machine and Gradient Lifting Decision Tree,FSLDL(Feature Se?lection Method by Label Distribution Learning Based on Imbalanced Data),PCA,SVM-RFE and F classify. The three unbalanced data acquisition methods of SMOTENN,NearMiss and ADASYN are compared and analyzed on the KCI dataset of the NASA fault dataset. The results show that the FSLSL proposed by the paper is superior to other unclassified data classification in the case of fea?ture selection only. The feature selection method is also improved when combined with the unbalanced data acquisition method.关键词
特征选择/分类/不平衡数据/类分布Key words
feature selection/classification/imbalanced data/label distribution learning分类
信息技术与安全科学引用本文复制引用
李克文,谢鹏,路慎强..基于不平衡数据类分布学习的特征选择方法∗[J].计算机与数字工程,2019,(9):2257-2261,2272,6.