|国家科技期刊平台
首页|期刊导航|计算机科学与探索|大语言模型驱动的知识图谱实体摘要的次模优化方法

大语言模型驱动的知识图谱实体摘要的次模优化方法OA北大核心CSTPCD

Submodular Optimization Approach for Entity Summarization in Knowledge Graph Driven by Large Language Models

中文摘要英文摘要

知识图谱的规模不断增加,使得实体摘要成为了研究的热点问题.实体摘要的目标是从描述实体的大规模三元结构事实中得到实体的简洁描述.研究的目的是基于大语言模型提出一种次模优化方法用于实体摘要的提取.首先,基于三元组中实体、关系和属性的描述信息,采用大语言模型对它们进行嵌入,能够有效地捕捉三元组的语义信息,生成包含丰富语义信息的嵌入向量.其次,基于大语言模型生成的嵌入向量,定义任意两个描述同一实体的三元组事实之间关联度的刻画方法,任意两个三元组之间的关联度越高,表示这两个三元组之间包含的信息越相似.最后,基于上述定义的三元组关联度的刻画方法,定义正规化且单调非减的次模函数,将实体摘要建模为次模函数最大化问题,那么具有性能保证的贪心算法可以直接用于提取实体的摘要.在三个公共基准数据集上进行测试,采用F1值和归一化折损累计增益(NDCG)两个指标对提取的实体摘要的质量进行评估,实验结果表明该方法显著优于当前最先进的方法.

The continuous expansion of the knowledge graph has made entity summarization a research hotspot.The goal of entity summarization is to obtain a brief description of an entity from large-scale triple-structured facts that describe it.The research aims to propose a submodular optimization method for entity summarization based on a large language model.Firstly,based on the descriptive information of entities,relationships,and properties in the triples,a large language model is used to embed them to vectors,effectively capturing the semantic information of the triples and generating embedding vectors containing rich semantic information.Secondly,based on the embed-ding vectors generated by the large language model,a method is defined to characterize the relevance between any two triples that describe the same entity.The higher the relevance between any two triples,the more similar the in-formation contained in these two triples.Finally,based on the defined method for characterizing triple relevance,a normalized and monotonically non-decreasing submodular function is defined,modeling entity summarization as a submodular function maximization problem.Therefore,greedy algorithms with performance guarantees can be di-rectly applied to extracting entity summaries.Testing is conducted on three public benchmark datasets,and the quality of the extracted entity summaries is evaluated using two metrics,F1 score and NDCG(normalized discounted cumu-lative gain).Experimental results show that the proposed approach significantly outperforms the state-of-the-art method.

张琪;钟昊

广州商学院 信息技术与工程学院,广州 511363华南师范大学 计算机学院,广州 510631

计算机与自动化

实体摘要大语言模型次模函数贪心算法

entity summarizationlarge language modelsubmodular functiongreedy algorithm

《计算机科学与探索》 2024 (007)

1806-1813 / 8

国家重点研发计划(2023YFC3341200);国家自然科学基金(62377015);华南师范大学青年教师科研培育基金项目(23KJ29).This work was supported by the National Key Research and Development Program of China(2023YFC3341200),the National Natural Science Foundation of China(62377015),and the Research Cultivation Fund for the Youth Teachers of South China Normal University(23KJ29).

10.3778/j.issn.1673-9418.2305086

评论