|国家科技期刊平台
首页|期刊导航|软件导刊|融合卷积门控与实体边界预测的中文财务报表实体抽取研究

融合卷积门控与实体边界预测的中文财务报表实体抽取研究OA

A Study on Entity Extraction for Chinese Financial Statements Incorporating Convolutional Gating and Entity Boundary Prediction

中文摘要英文摘要

在金融领域财务报表对企业的发展规划具有重要作用,但提取报表中的有效信息仍然高度依赖于人工.为此,提出一种融合关键信息和实体边界信息的财务报表命名实体识别方法,以提升财务报表有效信息提取效率.首先,通过两个卷积层、自注意力机制及门控机制组成的卷积门控单元对编码器的输出进行局部特征提取,筛选关键信息来引导实体识别;其次,通过实体边界预测模块将实体边界信息融入具有句子依赖关系的长序列语义特征;最后,将关键信息和融合了实体边界信息的长序列语义特征输入条件随机场层,以提取满足实体标注规则的相邻标签间的依赖,并获得全局最优标签序列.实验表明,所提模型在Resume、MSRA数据集上的F1值分别为95.75%、94.92%,优于所有比较模型,证明了该方法在中文命名实体识别的有效性;在财务报表数据集上的准确率、召回率、F1值分别为87.93%、92.45%、90.13%,相较于基线模型效果更好,能有效识别金融领域命名实体.

Financial statements play an important role in the development planning of enterprises in the financial field,but extracting effec-tive information from the statements still heavily relies on manual labor.To this end,a named entity recognition method for financial state-ments is proposed that integrates key information and entity boundary information to improve the efficiency of extracting effective information from financial statements.Firstly,a convolutional gating unit consisting of two convolutional layers,self attention mechanism,and gating mechanism is used to extract local features from the encoder's output,screen key information,and guide entity recognition;Then,the entity boundary prediction module is used to integrate the entity boundary information into the long sequence semantic features with sentence depen-dency relationships;Finally,the key information and the long sequence semantic features fused with entity boundary information are input in-to the conditional random field layer to extract the dependencies between adjacent labels that meet the entity labeling rules and obtain the glob-al optimal label sequence.The experiment shows that the F1 values of the proposed model on the Resume and MSRA datasets are 95.75%and 94.92%,respectively,which are better than all comparison models,proving the effectiveness of this method in Chinese named entity recogni-tion;The accuracy,recall,and F1 values on the financial report publication dataset are 87.93%,92.45%,and 90.13%,respectively.Com-pared with the baseline model,the model performs better and can effectively identify named entities in the financial field.

王婷;杨川;梁佳莹;向东;邹茂扬

成都信息工程大学 计算机学院,四川 成都 610225

计算机与自动化

金融命名实体识别卷积门控单元实体边界预测条件随机场

financename entity recognitionconvolutional gating unitsentity boundary predictionconditional random fields

《软件导刊》 2024 (007)

25-33 / 9

四川省科技厅重点研发项目(2021YFG0031,2022YFG0375);四川省科技服务业示范项目(2021GFW130)

10.11907/rjdk.231621

评论