首页|期刊导航|石油钻采工艺|针对弱规范石油文档的自然语言数据收集方法

针对弱规范石油文档的自然语言数据收集方法

常启帆杨煦旻朱方辉金龙刘伟郑力会

石油钻采工艺2024，Vol.46Issue(4)：509-524,16.

石油钻采工艺2024，Vol.46Issue(4)：509-524,16.DOI:10.13639/j.odpt.202409038

针对弱规范石油文档的自然语言数据收集方法

A natural language processing data collection method for weakly regulated professional petroleum documents

常启帆 ¹杨煦旻 ²朱方辉 ³金龙 ⁴刘伟 ⁵郑力会²

作者信息

1. 中国石油大学(北京)石油工程学院,北京昌平||四川省成都市第七中学,四川成都
2. 中国石油大学(北京)石油工程学院,北京昌平
3. 中国石油大学(北京)石油工程学院,北京昌平||中国石油长庆油田分公司油气工艺研究院,陕西西安
4. 中国石油大学(北京)石油工程学院,北京昌平||中国教育科学研究院教育统计分析研究所,北京海淀
5. 中国石油长庆油田分公司油气工艺研究院,陕西西安||低渗透油气田勘探开发国家工程实验室,陕西西安
折叠

摘要

Abstract

The existing data collection methods lead to low accuracy and long collection time due to misreading in weak specification and strong professional vocabulary documents,and the problem of low data collection accuracy and long collection time is solved by establishing a fast and accurate data collection model suitable for the characteristics of petroleum engineering documents and providing a data basis for big data computing.Firstly,a hierarchical structure of entries was established to identify the differences in petroleum engineering documentation.Then,a dictionary of technical terms is established so that the computer can recognize the technical terms of petroleum engineering in the document;Finally,based on the natural language model,the SPBERT data collection model was constructed through a large amount of data training.Realize the import of workover-related Word documents,and the model can automatically output the data and corresponding labels in the document.The accuracy of the model was verified by comparing the model with two existing regular methods,two general BERT models and one GPT model on the field workover data of Changqing Oilfield,and the time taken by the model to collect data was counted.The average accuracy of the five data collection models was 40.06%,and the accuracy of the SPBERT model in the workover data collection was 82.3%,which was more than 1 times higher than the average accuracy.The SPBERT model collected 402 milliseconds for each set of correct data collected,which was 27.44%less than the average collection time of 554 milliseconds for the rest of the models.The SPBERT model can collect supplementary data with high accuracy and short model collection time,which can further enhance the professionalism of natural language models and promote the construction of digital intelligence in oilfields.

关键词

数据资产/数据收集/自然语言处理/修井/智能油田/新质生产力

Key words

Data assets/Data collection/Natural language processing/Workover/Smart oilfield/New quality productivity

分类

能源科技

引用本文复制引用

常启帆,杨煦旻,朱方辉,金龙,刘伟,郑力会..针对弱规范石油文档的自然语言数据收集方法[J].石油钻采工艺,2024,46(4):509-524,16.

基金项目

国家科技重大专项"多气合采钻完井技术和储层保护"(项目编号:2016ZX05066002). （项目编号:2016ZX05066002）

石油钻采工艺

OA北大核心CSTPCD

ISSN：1000-7393

访问量0

下载量0

段落导航