国防科技大学学报2024,Vol.46Issue(4):175-183,9.DOI:10.11887/j.cn.202404019
卫星领域语料库构建与命名实体识别
Satellite domain corpus construction and named entity recognition
摘要
Abstract
Aiming at the lack of named entity corpus in the satellite domain and the low recognition performance of existing algorithms,a satellite domain entity labeling method considering fuzzy boundaries was proposed,constructed a corpus containing 8 common satellite domain entities where the granularity was finer and the coverage was wider in comparison with the existing corpora in this field.Based on this,a transfer learning and multi-network fusion satellite domain entity recognition algorithm was proposed.Algorithm used pretrained bidirectional encoder representations for transformers to smoothly transfer the semantics of the corpus for subword-level features,a BiLSTM(bi-directional long-short term memory)network for capturing contextual information to determine boundaries,and label prediction was achieved using a conditional random field as a decoder.Experimental results show that,compared with traditional models such as BiLSTM,the proposed algorithm has better recognition performance where the F1-score in 8 entities is all above 92%and the micro-average F1-score reaches96.10%.关键词
命名实体识别/迁移学习/神经网络/数据稀缺Key words
name entity recognition/transfer learning/neural networks/data scarcity分类
航空航天引用本文复制引用
徐聪,石会鹏,陈志敏,张鑫宇,王静,杨甲森..卫星领域语料库构建与命名实体识别[J].国防科技大学学报,2024,46(4):175-183,9.基金项目
中国科学院复杂航天系统电子信息技术重点实验室择优基金资助项目(Y42613A32S) (Y42613A32S)