计算机应用研究2024,Vol.41Issue(1):212-216,5.DOI:10.19734/j.issn.1001-3695.2023.05.0281
基于自蒸馏与自集成的问答模型
Question answering model based on self-distillation and self-ensemble
王同结 1李烨1
作者信息
- 1. 上海理工大学光电信息与计算机工程学院,上海 200093
- 折叠
摘要
Abstract
Knowledge distillation combined with pre-trained language models is one of the primary methods for constructing question-answering models.However,these methods suffer from inefficiencies in knowledge transfer,time-consuming teacher model training,and mismatched capabilities between teacher and student models.To address these issues,this paper proposed a question-answering model based on self-distillation and self-ensemble,named SD-SE-BERT.The self-ensemble mechanism was designed based on a sliding window;the student model used BERT;the teacher model was derived from a weighted average combination of several student models during the training process,based on their performance on the validation set.The loss function used the output of the ensemble and the true labels to guide the training of the student model in the current round.Ex-perimental results on the SQuAD1.1 dataset show that the EM and F,scores of SD-SE-BERT are respectively 7.5 and 4.9 higher than those of the BERT model,and the model's performance surpasses other representative single models and distillation models.Compared to the fine-tuning results of the large-scale language model ChatGLM-6B,the EM score was improved by 4.5,and the score by 2.5.It proves that SD-SE-BERT can leverage the model's supervision information to enhance the model's capacity to combine different text data features,eliminating the need for complex teacher-model training and avoiding the problem of mismatch between teacher and student models.关键词
问答模型/知识蒸馏/集成学习/BERTKey words
question answering model/knowledge distillation/ensemble learning/BERT分类
信息技术与安全科学引用本文复制引用
王同结,李烨..基于自蒸馏与自集成的问答模型[J].计算机应用研究,2024,41(1):212-216,5.