|国家科技期刊平台
首页|期刊导航|计算机工程|文档上下文异构表示的句子级关系抽取方法

文档上下文异构表示的句子级关系抽取方法OA北大核心CSTPCD

Sentence-level Relation Extraction Method with Document Context Heterogeneous Representation

中文摘要英文摘要

关系抽取是指从文本中识别2个实体的关系.现有研究利用数据分组处理取得了良好的效果,但由于组内数据之间交互较少,因此大多忽略了组内数据之间的关联.此外,部分方法定义了较多种类的标注信息,从而导致标注信息冗余.针对上述问题,提出一种文档上下文异构表示的句子级关系抽取方法.设计基于异构图网络的文档上下文信息模块,将组内数据中的词和关系建模为图上的节点,然后通过消息传递机制将组内信息进行交互,充分表征组内数据间的关联关系;设计基于异构图网络的关系信息模块用于捕捉关系信息,其与文档上下文信息模块中异构图网络参数共享,从而节约了计算资源;设计融合标记策略,引入一种逻辑上的虚拟标签减少标签种类,缓解标注信息冗余问题.实验结果表明,所构建模型在NYT和WebNLG数据集上的F1值分别为93.2%和94.7%,在复杂场景下的8个子任务中,取得了 6个子任务的最优表现,验证了所提方法的有效性.

The goal of relationship extraction is to identify relationships between two entities in a text.Although recent studies have achieved promising results in relationship extraction by grouping data for processing,there is limited interaction between the grouped data,and correlations between them are thus overlooked.Moreover,some existing methods involve many labels,leading to redundant labeling information.To address these issues,this study proposes a sentence-level relation extraction method with document context heterogeneous representation.By employing a document context module based on a heterogeneous graph network,the words and relations within data groups are modeled as nodes in a graph,and intragroup information interaction is achieved through message passing,effectively representing correlations between data within the groups.This method employs a relation information module based on a heterogeneous graph network to capture relation information with shared parameters between the heterogeneous graph network in the document context module,thereby saving computational resources.Additionally,the method employs a fusion labeling strategy that uses a logical virtual label to reduce the number of label categories and minimize redundant labeling information.The experimental results demonstrate that the constructed model achieves F1 values of 93.2%for the NYT dataset and 94.7%for the WebNLG dataset.This method outperforms comparative models in six of the eight subtasks of complex scenarios,validating its effectiveness.

曹渝昆;程宇;何祯奕;徐康乐;颜家洛;李云峰

上海电力大学计算机科学与技术学院,上海 201306中国商飞上海航空工业(集团)有限公司信息中心,上海 201203

计算机与自动化

融合标记异构图网络单模块单步模型句子级关系抽取自然语言处理

fusion labelingheterogeneous graph networkone-module one-step modelsentence-level relation extractionnatural language processing

《计算机工程》 2024 (005)

111-119 / 9

上海市自然科学基金(20ZR1421600).

10.19678/j.issn.1000-3428.0067686

评论