| 注册
首页|期刊导航|计算机应用研究|基于自蒸馏与自集成的问答模型

基于自蒸馏与自集成的问答模型

王同结 李烨

计算机应用研究2024,Vol.41Issue(1):212-216,5.
计算机应用研究2024,Vol.41Issue(1):212-216,5.DOI:10.19734/j.issn.1001-3695.2023.05.0281

基于自蒸馏与自集成的问答模型

Question answering model based on self-distillation and self-ensemble

王同结 1李烨1

作者信息

  • 1. 上海理工大学光电信息与计算机工程学院,上海 200093
  • 折叠

摘要

Abstract

Knowledge distillation combined with pre-trained language models is one of the primary methods for constructing question-answering models.However,these methods suffer from inefficiencies in knowledge transfer,time-consuming teacher model training,and mismatched capabilities between teacher and student models.To address these issues,this paper proposed a question-answering model based on self-distillation and self-ensemble,named SD-SE-BERT.The self-ensemble mechanism was designed based on a sliding window;the student model used BERT;the teacher model was derived from a weighted average combination of several student models during the training process,based on their performance on the validation set.The loss function used the output of the ensemble and the true labels to guide the training of the student model in the current round.Ex-perimental results on the SQuAD1.1 dataset show that the EM and F,scores of SD-SE-BERT are respectively 7.5 and 4.9 higher than those of the BERT model,and the model's performance surpasses other representative single models and distillation models.Compared to the fine-tuning results of the large-scale language model ChatGLM-6B,the EM score was improved by 4.5,and the score by 2.5.It proves that SD-SE-BERT can leverage the model's supervision information to enhance the model's capacity to combine different text data features,eliminating the need for complex teacher-model training and avoiding the problem of mismatch between teacher and student models.

关键词

问答模型/知识蒸馏/集成学习/BERT

Key words

question answering model/knowledge distillation/ensemble learning/BERT

分类

信息技术与安全科学

引用本文复制引用

王同结,李烨..基于自蒸馏与自集成的问答模型[J].计算机应用研究,2024,41(1):212-216,5.

计算机应用研究

OA北大核心CSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文