智能系统学报2016,Vol.11Issue(3):366-375,10.DOI:10.11992/tis.201603048
一种结合词向量和图模型的特定领域实体消歧方法
A novel method using word vector and graphical models for entity disambiguation in specific topic domains
摘要
Abstract
In this paper, a novel method based on word vector and graph models is proposed to deal with entity dis⁃ambiguation in specific topic domains. Take the tourism topic domain as an example. The method firstly chooses the web-pages of the tourism category in a Wikipedia offline database to build a knowledge base; then, the tool Word2Vec is used to build a word vector model with the texts in the knowledge base and texts taken from several tourism websites. Combined with a manual annotation graph, a random walk algorithm based on the graph is used to compute similarity to accurately calculate the similarity between words within the tourism domain. Next, the method extracts several keywords from the background text of the entity to be disambiguated and compares them with the keyword text in the knowledge base that describes the candidate entities. Finally, the method uses the trained Word2Vec model and graphical model to calculate the similarity between the keywords of name mention and the keywords of candidate entities. The method then chooses the candidate entities which have the maximum average similarity to the target entity. Experimental results show that this new method can effectively capture the similarity between name mention and a target entity;thus, it can accurately achieve entity disambiguation of a topic-specific domain.关键词
实体消歧/实体链接/Word2Vec/图模型/随机游走/维基百科Key words
entity disambiguation/entity linking/Word2Vec/Wikipedia/graphical model/random walking分类
信息技术与安全科学引用本文复制引用
汪沛,线岩团,郭剑毅,永华,陈玮,王红斌..一种结合词向量和图模型的特定领域实体消歧方法[J].智能系统学报,2016,11(3):366-375,10.基金项目
国家自然科学基金项目(61262041,61472168,61462054,61562052);云南省自然科学基金重点项目(2013FA030). ()