| 注册
首页|期刊导航|智慧农业(中英文)|基于RoFormer预训练模型的指针网络农业病害命名实体识别

基于RoFormer预训练模型的指针网络农业病害命名实体识别

王彤 王春山 李久熙 朱华吉 缪祎晟 吴华瑞

智慧农业(中英文)2024,Vol.6Issue(2):85-94,10.
智慧农业(中英文)2024,Vol.6Issue(2):85-94,10.DOI:10.12133/j.smartag.SA202311021

基于RoFormer预训练模型的指针网络农业病害命名实体识别

Agricultural Disease Named Entity Recognition with Pointer Network Based on RoFormer Pre-trained Model

王彤 1王春山 2李久熙 3朱华吉 4缪祎晟 4吴华瑞4

作者信息

  • 1. 国家农业信息化工程技术研究中心,北京 100097,中国||河北农业大学 信息科学与技术学院,河北保定 071001,中国
  • 2. 国家农业信息化工程技术研究中心,北京 100097,中国||河北农业大学 信息科学与技术学院,河北保定 071001,中国||河北省农业大数据重点实验室,河北保定 071001,中国
  • 3. 河北农业大学 机电工程学院,河北保定 071001,中国
  • 4. 国家农业信息化工程技术研究中心,北京 100097,中国||农业农村部农业信息化技术重点实验室,北京 100097,中国
  • 折叠

摘要

Abstract

[Objective]With the development of agricultural informatization,a large amount of information about agricultural diseases exists in the form of text.However,due to problems such as nested entities and confusion of entity types,traditional named entities recognition(NER)methods often face challenges of low accuracy when processing agricultural disease text.To address this issue,this study pro-poses a new agricultural disease NER method called RoFormer-PointerNet,which combines the RoFormer pre-trained model with the PointerNet baseline model.The aim of this method is to improve the accuracy of entity recognition in agricultural disease text,provid-ing more accurate data support for intelligent analysis,early warning,and prevention of agricultural diseases. [Methods]This method first utilized the RoFormer pre-trained model to perform deep vectorization processing on the input agricultur-al disease text.This step was a crucial foundation for the subsequent entity extraction task.As an advanced natural language process-ing model,the RoFormer pre-trained model's unique rotational position embedding approach endowed it with powerful capabilities in capturing textual positional information.In agricultural disease text,due to the diversity of terminology and the existence of polysemy,traditional entity recognition methods often faced challenges in confusing entity types.However,through its unique positional embed-ding mechanism,the RoFormer model was able to incorporate more positional information into the vector representation,effectively enriching the feature information of words.This characteristic enabled the model to more accurately distinguish between different enti-ty types in subsequent entity extraction tasks,reducing the possibility of type confusion.After completing the vectorization representa-tion of the text,this study further emploied a pointer network for entity extraction.The pointer network was an advanced sequence la-beling approach that utilizes head and tail pointers to annotate entities within sentences.This labeling method was more flexible com-pared to traditional sequence labeling methods as it was not restricted by fixed entity structures,enabling the accurate extraction of all types of entities within sentences,including complex entities with nested relationships.In agricultural disease text,entity extraction of-ten faced the challenge of nesting,such as when multiple different entity types are nested within a single disease symptom description.By introducing the pointer network,this study effectively addressed this issue of entity nesting,improving the accuracy and complete-ness of entity extraction. [Results and Discussions]To validate the performance of the RoFormer-PointerNet method,this study constructed an agricultural dis-ease dataset,which comprised 2 867 annotated corpora and a total of 10 282 entities,including eight entity types such as disease names,crop names,disease characteristics,pathogens,infected areas,disease factors,prevention and control methods,and disease stages.In comparative experiments with other pre-trained models such as Word2Vec,BERT,and RoBERTa,RoFormer-PointerNet demonstrated superiority in model precision,recall,and F1-Score,achieving 87.49%,85.76%and 86.62%,respectively.This result demonstrated the effectiveness of the RoFormer pre-trained model.Additionally,to verify the advantage of RoFormer-PointerNet in mitigating the issue of nested entities,this study compared it with the widely used bidirectional long short-term memory neural net-work(BiLSTM)and conditional random field(CRF)models combined with the RoFormer pre-trained model as decoding methods.RoFormer-PointerNet outperformed the RoFormer-BiLSTM,RoFormer-CRF,and RoFormer-BiLSTM-CRF models by 4.8%,5.67%and 3.87%,respectively.The experimental results indicated that RoFormer-PointerNet significantly outperforms other models in entity recognition performance,confirming the effectiveness of the pointer network model in addressing nested entity issues.To validate the superiority of the RoFormer-PointerNet method in agricultural disease NER,a comparative experiment was conducted with eight mainstream NER models such as BiLSTM-CRF,BERT-BiLSTM-CRF,and W2NER.The experimental results showed that the Ro-Former-PointerNet method achieved precision,recall,and F1-Score of 87.49%,85.76%and 86.62%,respectively in the agricultural disease dataset,reaching the optimal level among similar methods.This result further verified the superior performance of the Ro-Former-PointerNet method in agricultural disease NER tasks. [Conclusions]The agricultural disease NER method RoFormer-PointerNet,proposed in this study and based on the RoFormer pre-trained model,demonstrates significant advantages in addressing issues such as nested entities and type confusion during the entity ex-traction process.This method effectively identifies entities in Chinese agricultural disease texts,enhancing the accuracy of entity rec-ognition and providing robust data support for intelligent analysis,early warning,and prevention of agricultural diseases.This re-search outcome holds significant importance for promoting the development of agricultural informatization and intelligence.

关键词

农业病害/命名实体识别/实体嵌套/RoFormer预训练模型/指针网络

Key words

agricultural disease/named entity recognition/entity nesting/RoFormer pre-trained model/pointer network

分类

农业科技

引用本文复制引用

王彤,王春山,李久熙,朱华吉,缪祎晟,吴华瑞..基于RoFormer预训练模型的指针网络农业病害命名实体识别[J].智慧农业(中英文),2024,6(2):85-94,10.

基金项目

国家现代农业产业技术体系(CARS-23-D07) (CARS-23-D07)

国家自然科学基金项目(62106065) (62106065)

河北省自然科学基金项目(F2022204004) National Modern Agricultural Industry Technology System(CARS-23-D07) (F2022204004)

National Natural Science Foundation of China(62106065) (62106065)

Hebei Provincial Natural Science Foundation Project(F2022204004) (F2022204004)

智慧农业(中英文)

OACSTPCD

2096-8094

访问量4
|
下载量0
段落导航相关论文