电力信息与通信技术2024,Vol.22Issue(11):52-59,8.DOI:10.16543/j.2095-641x.electric.power.ict.2024.11.07
基于改进BERT预训练模型的电力标准命名实体识别方法研究
Research on Power Standard Named Entity Recognition Based on Improved BERT Pre-training Model
摘要
Abstract
In recent years,the importance of high-quality development and digital transformation of the power industry has gradually become prominent,which puts forward new requirements for the digital transformation research of power standards,and also brings new challenges and opportunities for the management,implementation and supervision of power standards.As an important support for social and economic development,the terminology and proper nouns in the field of electric power have high specificity and complexity,and the traditional named entity recognition method based on rule and feature engineering has the limitations of low recognition accuracy,difficult to separate terms,and relying on expert experience when dealing with standard documents in the field of electric power.In order to overcome these problems,this paper proposes an improved BERT named entity recognition model.By introducing the power term corpus,word features and lexical information in the field,10 kinds of power entities are identified on the power standard corpus,and F1 reaches 81%,which realizes the effective identification of long term entities in the electric power field,improves the processing efficiency and accuracy of power standard documents,and provides support for the information processing and application of power standards.Through the research of this paper,it can promote the automatic processing ability of power standard documents,improve the digitalization level of the power industry,and provide strong technical support for the specification formulation,knowledge management and decision support of the power industry.关键词
命名实体识别/标准数字化/自然语言处理/电力标准Key words
named entity recognition/standard digitization/natural language processing/power standards分类
信息技术与安全科学引用本文复制引用
贺馨仪,董明,颜拥,姚影,黄建平..基于改进BERT预训练模型的电力标准命名实体识别方法研究[J].电力信息与通信技术,2024,22(11):52-59,8.基金项目
国家电网有限公司总部科技项目资助"国家电网公司标准数字化实现路径及关键技术研究"(5700-202241437A-2-0-ZN). (5700-202241437A-2-0-ZN)