基于分歧的核心数据集筛选算法OACSTPCD
An Efficient Core-set Selection Algorithm Based on Difference
随着深度学习的发展,运用于训练的数据集规模日益增大,导致深度神经网络训练的效率低下.针对这种情况,提出了基于分歧的核心数据集筛选算法,即在保证训练效果的情况下对原数据集进行精简得出核心数据集.算法使用迭代的方式以有监督学习方式进行学习,通过投票网络框架计算各数据的分歧值并以此排序进行筛选.对广泛使用的CI-FAR、Fashion-MNIST以及SVHN数据集进行核心数据集筛选实验,结果表明所提出的算法在得到核心数据集规模为原始规模五分之一的同时,其训练模型的精度仅下降不超过5%.同时,其筛选出的核心数据集的泛化误差仅为0.13,其泛用性更佳.
With the development of deep learning,the scale of datasets is accumulating at an unprecedented speed,the pro-cess of training is inefficiency.It is usually necessary to simplify the original data set while ensuring similar training effect.In view of this,a core-set selection algorithm based on divergence is proposed.The algorithm uses the iterative method to learn in a supervised learning way,and calculates the divergence values of each data through the voting network framework,and then sorts them to select.The core-set selection experiments on CIFAR,Fashion-MNIST and SVHN datasets are carried out.The results show that the pro-posed algorithm can obtain a core-set size of one fifth of the original size,while the accuracy of the training model is only reduced by less than 5%.At the same time,the generalization error of the core dataset is only 0.13,which makes it more universal.
王纵驰;刘健;王培;赵兴博;于佳耕;陶青川
中国航空油料集团有限公司 北京 100088航天神舟智慧系统技术有限公司 北京 100029四川大学电子信息学院 成都 610065中国科学院软件研究所 北京 100190
计算机与自动化
卷积神经网络核心数据集筛选有监督学习主动学习
convolutional neural networkcore set selectionsupervised learningactive learning
《计算机与数字工程》 2024 (005)
1304-1309,1316 / 7
评论