计算机工程与应用2023,Vol.59Issue(24):121-130,10.DOI:10.3778/j.issn.1002-8331.2208-0109
融合剪枝和多语微调的黏着语命名实体识别
Agglutinative Languages Named Entity Recognition Based on Pruner and Multilingual Fine-Tuning
摘要
Abstract
Minority languages,represented by Uyghur,are characterized by agglutination and lack resources,which pose great challenges for their named entity recognition tasks.Meanwhile,the multilingual model suffers from problems such as large parameter scale,large word vocabularies,and slow inference speed.In order to explore the best fine-tuning strategy to alleviate the low-resource problem,monolingual and multilingual fine-tuning are performed for five agglutinative lan-guages,namely Uyghur,Kazakh,Kirghiz,Uzbek,and Tatar,respectively.The experimental results show that CINO-Agglu reduces the model size,number of parameters,word list size,and inference time by 45%,44%,92%,and 38%,respectively,compared with the period before pruning,and the average F1 score on the five languages is 85.9%,which exceeds all baseline models.The inclusion of appropriately sized data from the same language branch is beneficial to enhance the fine-tuning effect.关键词
黏着语/低资源语言/命名实体识别/多语言迁移/模型剪枝Key words
agglutinative language/low-resource language/named entity recognition/cross-lingual transfer/model pruner分类
信息技术与安全科学引用本文复制引用
罗凯昂,哈里旦木·阿布都克里木,刘畅,阿布都克力木·阿布力孜,郭文强..融合剪枝和多语微调的黏着语命名实体识别[J].计算机工程与应用,2023,59(24):121-130,10.基金项目
国家自然科学基金(61866035,61966033) (61866035,61966033)
国家重点研发专项(2018YFC0825504). (2018YFC0825504)