基于知识集成流形的电力设备缺陷文本数据增强方法与应用研究OA北大核心CSTPCD
Data Augmentation and Application of Defect Texts for Power Equipment Based on Knowledge Integration Manifold
当前电网数字化转型升级,电力设备智能运维技术快速发展,在运维过程中积累了大量包含电网重要信息的电力设备缺陷文本.由于文本数据标签稀疏,以及描述语言的模糊性、差异性等问题,电力文本中的运维信息难以被有效挖掘.文章提出了一种针对电力设备缺陷文本的数据增强方法.首先,使用缺陷文本数据集微调预训练模型ERNIE(enhanced representation through knowledge integration),应用多阶段知识掩码策略将电气领域专业知识集成到对缺陷文本的动态编码中;然后在流形假设的基础上基于降噪自动编码器架构设计破坏函数和重建函数,遵循基于信息价值的掩码单元选择策略构建破坏函数,基于微调过的ERNIE构建重建函数,在"破坏-重建"过程中获得位于原始数据流形范围内的增强样本;其次对增强数据集基于影响函数和多样性度量进行数据选择,过滤掉数据质量差和重复度高的增强样本;最后通过多层训练框架,将增强数据应用于各种缺陷文本挖掘任务.算例基于真实设备巡检、检修记录构建了电力设备缺陷文本等级分类任务.结果表明,所提出的算法对缺陷文本挖掘效果有较大提升,并且可以广泛灵活地应用在多种电力设备缺陷文本挖掘任务中.
With the digital transformation and upgrade of the power grids,the intelligent operation and maintenance technology of the power equipment has developed rapidly.During the operation and maintenance process,a large number of defect texts containing important information of the power grids have been accumulated.Due to the sparseness of text data labels,as well as the fuzziness and diversity of the literal descriptions,it is difficult to effectively mine the operation and maintenance information in power texts.A data augmentation of the defect texts for the power equipment is proposed.Firstly,the defect text data sets are used to fine-tune the pre-training model ERNIE(enhanced representation through knowledge integration)with the multi-stage knowledge mask strategy,integrating electrical expertise into dynamic encoding of defect texts.Secondly,on the basis of manifold assumption,the destruction and reconstruction functions are designed based on the denoising autoencoder.The destruction function is constructed according to the mask unit selection strategy based on the information value,and the reconstruction function is constructed based on the fine-tuned ERNIE.The enhanced samples are obtained during the process of the destruction and reconstruction.Then,the augmented data is selected based on the influence function and the diversity measures,filtering out the samples with poor data quality and high repetition.Finally,the augmented data is applied to various text mining tasks through a multi-layer training framework.Results show that the algorithm is able to greatly improve the effect of the defect text mining,and can be widely and flexibly applied in a variety of power equipment defect text mining tasks.
王绪亮;顾媛丽;张鸿儒;刘灵慧;刘洪顺;李清泉
特高压输变电技术与装备山东省重点实验室(山东大学),山东省济南市 250061国网山东省电力公司莱芜供电公司,山东省济南市 271100
动力与电气工程
电力设备缺陷文本数据增强知识集成数据筛选
power equipment defect textdata augmentationknowledge integrationdata filtering
《电网技术》 2024 (004)
1690-1699,中插77-中插79 / 13
国网山东省电力公司科技项目(520612220004).Project Supported by State Grid Shandong Electric Power Company Science & Technology Project(520612220004).
评论