不完备数据集的邻域容差互信息选择集成分类算法OA北大核心CSTPCD
Neighborhood-tolerance mutual information selection ensemble classification algorithm for incomplete data sets
针对不完备混合信息系统的分类问题,结合粒计算中的邻域容差关系和互信息理论,定义邻域容差互信息的概念,并利用集成学习的思想,提出不完备数据集的邻域容差互信息选择集成分类算法.该算法首先根据缺失属性得到信息粒,划分粒层构建粒空间,在不同的粒层上使用以BP神经网络作为基分类器的集成算法,构建新的基分类器;然后,根据每个信息粒的缺失属性计算出关于类属性的邻域容差互信息,来衡量各个信息粒的重要度,并根据基分类器预测准确率以及邻域容差互信息重新定义基分类器权重;最后,根据预测样本对基分类器加权集成预测分类结果,并与传统的集成分类算法进行对比分析.对于部分不完备混合型数据集,新提出的集成分类算法能有效提升分类准确率.
In order to solve the classification problem of incomplete mixed information systems,the concept of neighborhood-tolerance mutual information is defined by combining neighborhood-tolerance and mutual information theory in granular computing,and a selective ensemble classification algorithm based on neighborhood-tolerance mutual information is proposed by using ensemble learning.In this algorithm,information particles are obtained according to the missing attributes,and the space is constructed by dividing the particles into different layers.A new base classifier is constructed by integrating the BP neural network as the base classifier on different layers.Then,the neighborhood-tolerance mutual information about class attributes is calculated according to the missing attributes of each information particle to measure the importance of each information particle,and the weight of the base classifier is redefined according to the prediction accuracy of the base classifier and the neighborhood-tolerance mutual information.Finally,based on the predicted samples,the weighted ensemble prediction results of base classifier are analyzed and compared with the traditional ensemble classification algorithm.For partial incomplete mixed data sets,the proposed ensemble classification algorithm can effectively improve the classification accuracy.
李丽红;董红瑶;刘文杰;李宝霖;代琪
华北理工大学理学院,唐山,063210||河北省数据科学与应用重点实验室,华北理工大学,唐山,063210||唐山市工程计算重点实验室,华北理工大学,唐山,063210华北理工大学理学院,唐山,063210||河北省数据科学与应用重点实验室,华北理工大学,唐山,063210||唐山市工程计算重点实验室,华北理工大学,唐山,063210||首钢矿业公司职工子弟学校,唐山,064404华北理工大学人工智能学院,唐山,063210中国石油大学(北京)自动化系,北京,102249
计算机与自动化
不完备混合信息系统邻域容差互信息集成学习分类
incomplete hybrid information systemneighborhood-tolerance mutual informationensemble learningclassification
《南京大学学报(自然科学版)》 2024 (001)
106-117 / 12
河北省数据科学与应用重点实验室项目(10120201),唐山市数据科学重点实验室项目(10120301)
评论