软件导刊2025,Vol.24Issue(6):49-57,9.DOI:10.11907/rjdk.241385
面向同分布数据的玉米产业链命名实体识别研究
Research on Named Entity Recognition of the Corn Industry Chain for Identically Distributed Data
摘要
Abstract
To explore the impact of data distribution characteristics on the performance of named entity recognition models,a method is pro-posed to enhance data homogeneity characteristics to improve model recognition performance.Firstly,K-S test and KDE distribution map were used to verify the homogeneity of the experimental data corpus,and a corn industry chain corpus containing 2100 samples was constructed.Secondly,design a multi-layer named entity recognition model based on BERT-BiLSTM-Attention-CRF,using three annotation strategies:BIO,BMES,and BIOES,to compare and verify the recognition performance of the proposed method with other methods;At the same time,fine-grained segmentation of the corn industry chain entities is carried out to verify the impact of data distribution characteristics on the recog-nition performance of the BERT-BiLSTM-Attention-CRF model.The experimental results showed that the F1 score of the corn variety with the most obvious distribution characteristics reached 96.61%.To further validate the generalization of the proposed method,the influence of the strength of data homogeneity distribution characteristics on the recognition performance of the model was analyzed on six datasets,and the relationship between p-value and recognition results was explored.Finally,resampling was performed on data with unclear distribution charac-teristics(such as corn diseases,pests,and grass damage)to enhance the data's distribution characteristics.The F1 scores increased by 5.71%,4.30%,and 11.78%,respectively.The experimental results show that entity categories with higher p-values correspond to higher F1 scores in recognition results,and vice versa.After resampling and enhancing the data's identically distributed features,the model's recogni-tion performance is significantly improved,indicating that the identically distributed features of the data directly affect the model's entity rec-ognition performance.The research results provide ideas and methods for data analysis in named entity recognition.关键词
玉米产业链/命名实体识别/同分布/重采样Key words
corn industry chain/named entity recognition/identical distribution/resampling分类
信息技术与安全科学引用本文复制引用
郝燕,杨婉霞,周蓓蓓,芦晶,袁炜炜,汤敏睿..面向同分布数据的玉米产业链命名实体识别研究[J].软件导刊,2025,24(6):49-57,9.基金项目
科技创新2030-"新一代人工智能"重大项目(2022ZD0115801) (2022ZD0115801)