石油钻采工艺2024,Vol.46Issue(4):509-524,16.DOI:10.13639/j.odpt.202409038
针对弱规范石油文档的自然语言数据收集方法
A natural language processing data collection method for weakly regulated professional petroleum documents
摘要
Abstract
The existing data collection methods lead to low accuracy and long collection time due to misreading in weak specification and strong professional vocabulary documents,and the problem of low data collection accuracy and long collection time is solved by establishing a fast and accurate data collection model suitable for the characteristics of petroleum engineering documents and providing a data basis for big data computing.Firstly,a hierarchical structure of entries was established to identify the differences in petroleum engineering documentation.Then,a dictionary of technical terms is established so that the computer can recognize the technical terms of petroleum engineering in the document;Finally,based on the natural language model,the SPBERT data collection model was constructed through a large amount of data training.Realize the import of workover-related Word documents,and the model can automatically output the data and corresponding labels in the document.The accuracy of the model was verified by comparing the model with two existing regular methods,two general BERT models and one GPT model on the field workover data of Changqing Oilfield,and the time taken by the model to collect data was counted.The average accuracy of the five data collection models was 40.06%,and the accuracy of the SPBERT model in the workover data collection was 82.3%,which was more than 1 times higher than the average accuracy.The SPBERT model collected 402 milliseconds for each set of correct data collected,which was 27.44%less than the average collection time of 554 milliseconds for the rest of the models.The SPBERT model can collect supplementary data with high accuracy and short model collection time,which can further enhance the professionalism of natural language models and promote the construction of digital intelligence in oilfields.关键词
数据资产/数据收集/自然语言处理/修井/智能油田/新质生产力Key words
Data assets/Data collection/Natural language processing/Workover/Smart oilfield/New quality productivity分类
能源科技引用本文复制引用
常启帆,杨煦旻,朱方辉,金龙,刘伟,郑力会..针对弱规范石油文档的自然语言数据收集方法[J].石油钻采工艺,2024,46(4):509-524,16.基金项目
国家科技重大专项"多气合采钻完井技术和储层保护"(项目编号:2016ZX05066002). (项目编号:2016ZX05066002)