| 注册
首页|期刊导航|计算机工程与应用|融合剪枝和多语微调的黏着语命名实体识别

融合剪枝和多语微调的黏着语命名实体识别

罗凯昂 哈里旦木·阿布都克里木 刘畅 阿布都克力木·阿布力孜 郭文强

计算机工程与应用2023,Vol.59Issue(24):121-130,10.
计算机工程与应用2023,Vol.59Issue(24):121-130,10.DOI:10.3778/j.issn.1002-8331.2208-0109

融合剪枝和多语微调的黏着语命名实体识别

Agglutinative Languages Named Entity Recognition Based on Pruner and Multilingual Fine-Tuning

罗凯昂 1哈里旦木·阿布都克里木 1刘畅 1阿布都克力木·阿布力孜 1郭文强1

作者信息

  • 1. 新疆财经大学 信息管理学院,乌鲁木齐 830012
  • 折叠

摘要

Abstract

Minority languages,represented by Uyghur,are characterized by agglutination and lack resources,which pose great challenges for their named entity recognition tasks.Meanwhile,the multilingual model suffers from problems such as large parameter scale,large word vocabularies,and slow inference speed.In order to explore the best fine-tuning strategy to alleviate the low-resource problem,monolingual and multilingual fine-tuning are performed for five agglutinative lan-guages,namely Uyghur,Kazakh,Kirghiz,Uzbek,and Tatar,respectively.The experimental results show that CINO-Agglu reduces the model size,number of parameters,word list size,and inference time by 45%,44%,92%,and 38%,respectively,compared with the period before pruning,and the average F1 score on the five languages is 85.9%,which exceeds all baseline models.The inclusion of appropriately sized data from the same language branch is beneficial to enhance the fine-tuning effect.

关键词

黏着语/低资源语言/命名实体识别/多语言迁移/模型剪枝

Key words

agglutinative language/low-resource language/named entity recognition/cross-lingual transfer/model pruner

分类

信息技术与安全科学

引用本文复制引用

罗凯昂,哈里旦木·阿布都克里木,刘畅,阿布都克力木·阿布力孜,郭文强..融合剪枝和多语微调的黏着语命名实体识别[J].计算机工程与应用,2023,59(24):121-130,10.

基金项目

国家自然科学基金(61866035,61966033) (61866035,61966033)

国家重点研发专项(2018YFC0825504). (2018YFC0825504)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文