常州大学学报(自然科学版)2024,Vol.36Issue(6):71-82,12.DOI:10.3969/j.issn.2095-0411.2024.06.009
融合群助教模型的两阶段知识蒸馏文本分类方法
Incorporating two-stage knowledge distillation text classification method with group assistant models
摘要
Abstract
In the case of pre-trained language models using the Transformer architecture for text clas-sification tasks,the better performing models suffer from a high number of parameters,huge training overhead,and high inference latency.This paper proposes a two-stage knowledge distillation text classification method incorporating a group teaching assistant model,in which the group teaching as-sistant model consists of a graph convolutional neural network teaching assistant model and a Trans-former teaching assistant model.The knowledge of the teacher model is distilled to the student mod-el by the Transformer assistant model,during which the two-stage distillation process is guided by the graph convolutional neural network assistant model.At the same time,a progressive knowledge distillation strategy is proposed for the intermediate knowledge distillation of the model,which ad-justs the level at which a specific teacher model is distilled according to the model knowledge distri-bution density.Experimental results on multiple datasets show that the proposed approach outper-forms the baseline approach in all cases and reduces the size of the model parameters by 48.20%and increases the speed of inference by 56.94%at the cost of a maximum loss of 0.73%of the F1-score value.关键词
文本分类/预训练语言模型/两阶段知识蒸馏/群助教模型/渐进式蒸馏Key words
text classification/pre-trained language model/two-stage knowledge distillation/group assistant models/progressive distillation分类
信息技术与安全科学引用本文复制引用
张骏强,高尚兵,苏睿,李文婷..融合群助教模型的两阶段知识蒸馏文本分类方法[J].常州大学学报(自然科学版),2024,36(6):71-82,12.基金项目
国家重点研发计划资助项目(2018YFB1004904) (2018YFB1004904)
国家自然科学基金面上资助项目(62076107) (62076107)
江苏省六大人才高峰资助项目(XYDXXJS-011). (XYDXXJS-011)