基于RoFormer预训练模型的指针网络农业病害命名实体识别OACSTPCD
Agricultural Disease Named Entity Recognition with Pointer Network Based on RoFormer Pre-trained Model
[目的/意义]针对实体嵌套、实体类型混淆等问题导致的农业病害命名实体识别(Named Entities Recog-nition,NER)准确率不高的情况,以PointerNet为基准模型,提出一种基于RoFormer预训练模型的指针网络农业病害NER方法RoFormer-PointerNet.[方法]采用RoFormer预训练模型对输入的文本进行向量化,利用其独特的旋转位置嵌入方法来捕捉位置信息,丰富字词特征信息,从而解决一词多义导致的类型易混淆的问题.使用指针网络进行解码,利用指针网络的首尾指针标注方式抽取句子中的所有实体,首尾指针标注方式可以解决实体抽取中存在的嵌套问题.[结果和讨论]自建农业病害数据集,数据集中包含2 867条标注语料,共10 282个实体.为验证RoFormer预训练模型在实体抽取上的优越性,采用Word2Vec、BERT、RoBERTa等多种向量化模型进行对比试验,RoFormer-PointerNet与其他模型相比,模型精确率、召回率、F1 值均为最优,分别为87.49%,85.76%和86.62%.为验证RoFormer-PointerNet在缓解实体嵌套的优势,与使用最为广泛的双向长短期记忆神经网络(Bi-directional Long Short-Term Memory,BiLSTM)和条件随机场(Conditional Random Field,CRF)模型进行对比试验,RoFormer-PointerNet比RoFormer-BiLSTM模型、RoFormer-CRF模型和RoFormer-BiLSTM-CRF模型分别高出4.8%、5.67%和3.87%,证明用指针网络模型可以很好解决实体嵌套问题.最后验证RoFormer-PointerNet方法在农业病害数据集中的识别性能,针对病害症状、病害名称、防治方法等8类实体进行了识别实验,本方法识别的精确率、召回率和F1值分别为87.49%、85.76%和86.62%,为同类最优.[结论]本研究提出的方法能有效识别中文农业病害文本中的实体,识别效果优于其他模型.在解决实体抽取过程中的实体嵌套和类型混淆等问题方面具有一定优势.
[Objective]With the development of agricultural informatization,a large amount of information about agricultural diseases exists in the form of text.However,due to problems such as nested entities and confusion of entity types,traditional named entities recognition(NER)methods often face challenges of low accuracy when processing agricultural disease text.To address this issue,this study pro-poses a new agricultural disease NER method called RoFormer-PointerNet,which combines the RoFormer pre-trained model with the PointerNet baseline model.The aim of this method is to improve the accuracy of entity recognition in agricultural disease text,provid-ing more accurate data support for intelligent analysis,early warning,and prevention of agricultural diseases. [Methods]This method first utilized the RoFormer pre-trained model to perform deep vectorization processing on the input agricultur-al disease text.This step was a crucial foundation for the subsequent entity extraction task.As an advanced natural language process-ing model,the RoFormer pre-trained model's unique rotational position embedding approach endowed it with powerful capabilities in capturing textual positional information.In agricultural disease text,due to the diversity of terminology and the existence of polysemy,traditional entity recognition methods often faced challenges in confusing entity types.However,through its unique positional embed-ding mechanism,the RoFormer model was able to incorporate more positional information into the vector representation,effectively enriching the feature information of words.This characteristic enabled the model to more accurately distinguish between different enti-ty types in subsequent entity extraction tasks,reducing the possibility of type confusion.After completing the vectorization representa-tion of the text,this study further emploied a pointer network for entity extraction.The pointer network was an advanced sequence la-beling approach that utilizes head and tail pointers to annotate entities within sentences.This labeling method was more flexible com-pared to traditional sequence labeling methods as it was not restricted by fixed entity structures,enabling the accurate extraction of all types of entities within sentences,including complex entities with nested relationships.In agricultural disease text,entity extraction of-ten faced the challenge of nesting,such as when multiple different entity types are nested within a single disease symptom description.By introducing the pointer network,this study effectively addressed this issue of entity nesting,improving the accuracy and complete-ness of entity extraction. [Results and Discussions]To validate the performance of the RoFormer-PointerNet method,this study constructed an agricultural dis-ease dataset,which comprised 2 867 annotated corpora and a total of 10 282 entities,including eight entity types such as disease names,crop names,disease characteristics,pathogens,infected areas,disease factors,prevention and control methods,and disease stages.In comparative experiments with other pre-trained models such as Word2Vec,BERT,and RoBERTa,RoFormer-PointerNet demonstrated superiority in model precision,recall,and F1-Score,achieving 87.49%,85.76%and 86.62%,respectively.This result demonstrated the effectiveness of the RoFormer pre-trained model.Additionally,to verify the advantage of RoFormer-PointerNet in mitigating the issue of nested entities,this study compared it with the widely used bidirectional long short-term memory neural net-work(BiLSTM)and conditional random field(CRF)models combined with the RoFormer pre-trained model as decoding methods.RoFormer-PointerNet outperformed the RoFormer-BiLSTM,RoFormer-CRF,and RoFormer-BiLSTM-CRF models by 4.8%,5.67%and 3.87%,respectively.The experimental results indicated that RoFormer-PointerNet significantly outperforms other models in entity recognition performance,confirming the effectiveness of the pointer network model in addressing nested entity issues.To validate the superiority of the RoFormer-PointerNet method in agricultural disease NER,a comparative experiment was conducted with eight mainstream NER models such as BiLSTM-CRF,BERT-BiLSTM-CRF,and W2NER.The experimental results showed that the Ro-Former-PointerNet method achieved precision,recall,and F1-Score of 87.49%,85.76%and 86.62%,respectively in the agricultural disease dataset,reaching the optimal level among similar methods.This result further verified the superior performance of the Ro-Former-PointerNet method in agricultural disease NER tasks. [Conclusions]The agricultural disease NER method RoFormer-PointerNet,proposed in this study and based on the RoFormer pre-trained model,demonstrates significant advantages in addressing issues such as nested entities and type confusion during the entity ex-traction process.This method effectively identifies entities in Chinese agricultural disease texts,enhancing the accuracy of entity rec-ognition and providing robust data support for intelligent analysis,early warning,and prevention of agricultural diseases.This re-search outcome holds significant importance for promoting the development of agricultural informatization and intelligence.
王彤;王春山;李久熙;朱华吉;缪祎晟;吴华瑞
国家农业信息化工程技术研究中心,北京 100097,中国||河北农业大学 信息科学与技术学院,河北保定 071001,中国国家农业信息化工程技术研究中心,北京 100097,中国||河北农业大学 信息科学与技术学院,河北保定 071001,中国||河北省农业大数据重点实验室,河北保定 071001,中国河北农业大学 机电工程学院,河北保定 071001,中国国家农业信息化工程技术研究中心,北京 100097,中国||农业农村部农业信息化技术重点实验室,北京 100097,中国
农业科学
农业病害命名实体识别实体嵌套RoFormer预训练模型指针网络
agricultural diseasenamed entity recognitionentity nestingRoFormer pre-trained modelpointer network
《智慧农业(中英文)》 2024 (002)
85-94 / 10
国家现代农业产业技术体系(CARS-23-D07);国家自然科学基金项目(62106065);河北省自然科学基金项目(F2022204004) National Modern Agricultural Industry Technology System(CARS-23-D07);National Natural Science Foundation of China(62106065);Hebei Provincial Natural Science Foundation Project(F2022204004)
评论