电子学报2024,Vol.52Issue(12):4125-4141,17.DOI:10.12263/DZXB.20231002
面向集成学习的流形近邻样本包络与分层多类型变换算法
Manifold Nearest Neighbor Sample Envelope and Hierarchical Multitype Transform Algorithm for Ensemble Learning
摘要
Abstract
Ensemble learning is an important branch and research hotspot in machine learning.The current main para-digm of ensemble learning algorithms is to obtain multiple sample subsets based on the original sample set,then to train the base classifiers separately and integrate the base classifier results.The main problem of this paradigm is that the diversity among subsets is significantly reduced since all subsets are derived from the original sample set.This problem is especially serious when the data size of the original sample set is small,the sampling ratio is large,and the degree of imbalance is high.In addition,the improvement in the divisibility of the sample subsets obtained by resampling is also limited when the divisibility of the original sample set is low.In order to solve this problem,this paper proposes a manifold nearest neighbor sample envelope and hierarchical multitype transformation algorithm for ensemble learning.It aims to improve the diversity and divisibility of the sample subset by transforming the original sample set into a hierarchical enveloped sample set with differentiation through the envelopment mechanism and the multitype sample transformation.First,the manifold nearest neighbor sample envelope mechanism is designed to transform the original samples into sample envelopes.Second,a multi-type sample transformation is performed on the sample envelope to reconstruct and generate hierarchical envelope samples.Third,the inter-layer consistency preservation mechanism based on joint structure domain adaptation is designed to pre-serve the distribution consistency of the samples before and after the transformation.Thus,improving the high representa-tion ability of the envelope samples to the original samples.Four,feature dimensionality reduction and basic classifier train-ing are performed separately for each layer of the envelope sample set.Finally,the final classification results are obtained using the two dimensional decision fusion mechanism.More than ten datasets and several representative algorithms are used in the experimental part for validation.The results show that compared with the original sample set,the proposed algo-rithm improves the diversity of the sample subsets,which improves the ensemble learning performance with up to 18.56%accuracy improvement.Compared with related ensemble learning algorithms,the accuracy of this paper's algorithm has been improved by up to 7.56%.This paper provides a new idea for the improvement of existing ensemble learning algo-rithms,and it is valuable to transform the paradigm of"ensemble learning directly based on original samples"into a new paradigm of"ensemble learning based on hierarchical envelope samples".关键词
集成学习/包络学习/样本变换/近邻样本包络化/域适应/分类器集成Key words
ensemble learning/envelope learning/sample transformation/nearest neighbor sample enveloping/do-main adaptation/classifier ensemble分类
信息技术与安全科学引用本文复制引用
颜芳,马洁,李勇明,王品,覃剑,刘承宇..面向集成学习的流形近邻样本包络与分层多类型变换算法[J].电子学报,2024,52(12):4125-4141,17.基金项目
国家自然科学基金(No.U21A20448,No.61771080) National Natural Science Foundation of China(No.U21A20448,No.61771080) (No.U21A20448,No.61771080)