电网技术2024,Vol.48Issue(4):1690-1699,中插77-中插79,13.DOI:10.13335/j.1000-3673.pst.2023.0713
基于知识集成流形的电力设备缺陷文本数据增强方法与应用研究
Data Augmentation and Application of Defect Texts for Power Equipment Based on Knowledge Integration Manifold
摘要
Abstract
With the digital transformation and upgrade of the power grids,the intelligent operation and maintenance technology of the power equipment has developed rapidly.During the operation and maintenance process,a large number of defect texts containing important information of the power grids have been accumulated.Due to the sparseness of text data labels,as well as the fuzziness and diversity of the literal descriptions,it is difficult to effectively mine the operation and maintenance information in power texts.A data augmentation of the defect texts for the power equipment is proposed.Firstly,the defect text data sets are used to fine-tune the pre-training model ERNIE(enhanced representation through knowledge integration)with the multi-stage knowledge mask strategy,integrating electrical expertise into dynamic encoding of defect texts.Secondly,on the basis of manifold assumption,the destruction and reconstruction functions are designed based on the denoising autoencoder.The destruction function is constructed according to the mask unit selection strategy based on the information value,and the reconstruction function is constructed based on the fine-tuned ERNIE.The enhanced samples are obtained during the process of the destruction and reconstruction.Then,the augmented data is selected based on the influence function and the diversity measures,filtering out the samples with poor data quality and high repetition.Finally,the augmented data is applied to various text mining tasks through a multi-layer training framework.Results show that the algorithm is able to greatly improve the effect of the defect text mining,and can be widely and flexibly applied in a variety of power equipment defect text mining tasks.关键词
电力设备缺陷文本/数据增强/知识集成/数据筛选Key words
power equipment defect text/data augmentation/knowledge integration/data filtering分类
信息技术与安全科学引用本文复制引用
王绪亮,顾媛丽,张鸿儒,刘灵慧,刘洪顺,李清泉..基于知识集成流形的电力设备缺陷文本数据增强方法与应用研究[J].电网技术,2024,48(4):1690-1699,中插77-中插79,13.基金项目
国网山东省电力公司科技项目(520612220004).Project Supported by State Grid Shandong Electric Power Company Science & Technology Project(520612220004). (520612220004)