计算机与数字工程2024,Vol.52Issue(5):1304-1309,1316,7.DOI:10.3969/j.issn.1672-9722.2024.05.008
基于分歧的核心数据集筛选算法
An Efficient Core-set Selection Algorithm Based on Difference
王纵驰 1刘健 2王培 2赵兴博 3于佳耕 4陶青川3
作者信息
- 1. 中国航空油料集团有限公司 北京 100088
- 2. 航天神舟智慧系统技术有限公司 北京 100029
- 3. 四川大学电子信息学院 成都 610065
- 4. 中国科学院软件研究所 北京 100190
- 折叠
摘要
Abstract
With the development of deep learning,the scale of datasets is accumulating at an unprecedented speed,the pro-cess of training is inefficiency.It is usually necessary to simplify the original data set while ensuring similar training effect.In view of this,a core-set selection algorithm based on divergence is proposed.The algorithm uses the iterative method to learn in a supervised learning way,and calculates the divergence values of each data through the voting network framework,and then sorts them to select.The core-set selection experiments on CIFAR,Fashion-MNIST and SVHN datasets are carried out.The results show that the pro-posed algorithm can obtain a core-set size of one fifth of the original size,while the accuracy of the training model is only reduced by less than 5%.At the same time,the generalization error of the core dataset is only 0.13,which makes it more universal.关键词
卷积神经网络/核心数据集筛选/有监督学习/主动学习Key words
convolutional neural network/core set selection/supervised learning/active learning分类
信息技术与安全科学引用本文复制引用
王纵驰,刘健,王培,赵兴博,于佳耕,陶青川..基于分歧的核心数据集筛选算法[J].计算机与数字工程,2024,52(5):1304-1309,1316,7.