基于字词向量的BiLSTM-CRF水利工程巡检文本实体识别模型OACSTPCD
Text Entity Recognition Model of BiLSTM-CRF Hydraulic Engineering Inspection Based on Word Vector
命名实体识别是构建水利知识图谱的核心技术.水利工程巡检文本是水利工程最为常见的数据类型,以文本形式记录,没有固定格式与结构,但其包含水利工程安全潜在风险信息,具有价值密度高的特点.针对水利工程巡检文本命名实体识别问题,提出字词向量融合的 BiLSTM-CRF模型,首先将巡检文本分别在字维度和词维度进行向量化处理,合并字词向量获取字词向量特征;然后利用 BiLSTM 神经网络获取序列化后的上下文特征;最后通过 CRF进行解码并提取相应实体.以南水北调中线工程巡检文本为例,实验结果表明:字词向量结合之后的方法能有效提高识别性能,对实体边界的识别效果更优,模型准确率、召回率和 F1 值分别可以达到 93.79%、93.06%、93.42%;时间效率较 BERT-BiLSTM-CRF 模型的时间效率提高82.86%.基于字词向量的 BiLSTM-CRF模型可为水利工程知识图谱的快速构建提供技术支撑.
Named entity recognition is the core technology for constructing water resources knowledge graphs.Hydraulic en-gineering inspection text is the most common data type of hydraulic engineering.Recorded in text form,there is no fixed format and structure,but it contains potential risk information of water conservancy project safety,characterized by high value density.In view of the problem of recognizing named entities in the text of water conservancy project inspection,the BiLSTM-CRF model for word-vector fusion is proposed.Firstly,the inspection text is vectorized in word dimension and word dimension respectively,and word vector is combined to obtain word vector features.Secondly,BiLSTM neural net-work is applied to obtain the serialized contextual features.Finally,it is decoded by CRF and the corresponding entities are extracted.Taking the inspection text of the middle route of South-to-North Water Transfer project as an example,the exper-imental results show that the method combined with word vector can effectively improve the recognition performance.The recognition effect of the entity boundary works better,and the model accuracy,recall and F1 value can reach 93.79%,93.06%and 93.42%,respectively.The time efficiency is 82.86%better than that of the BERT-BiLSTM-CRF model.The BiLSTM-CRF model based on word vector can provide technical support for the rapid construction of hydraulic engineering knowledge graph.
刘雪梅;程彭圣男;李海瑞;曹闯;高英;崔培
华北水利水电大学 信息工程学院,河南 郑州 450046河南省水利勘测设计研究有限公司,河南 郑州 450016华北水利水电大学 管理与经济学院,河南 郑州 450046黄河水利水电开发集团有限公司,河南 郑州 450003
计算机与自动化
巡检文本实体识别双向长短期记忆神经网络Word2Vec条件向量场
inspection textentity recognitionBiLSTM neural networkWord2Vecconditional vector field
《华北水利水电大学学报(自然科学版)》 2024 (003)
9-17 / 9
国家自然科学基金项目(72271091);河南省科学院科技开放合作项目(220901008).
评论