| 注册
首页|期刊导航|智能系统学报|一种结合词向量和图模型的特定领域实体消歧方法

一种结合词向量和图模型的特定领域实体消歧方法

汪沛 线岩团 郭剑毅 永华 陈玮 王红斌

智能系统学报2016,Vol.11Issue(3):366-375,10.
智能系统学报2016,Vol.11Issue(3):366-375,10.DOI:10.11992/tis.201603048

一种结合词向量和图模型的特定领域实体消歧方法

A novel method using word vector and graphical models for entity disambiguation in specific topic domains

汪沛 1线岩团 1郭剑毅 2永华 1陈玮 2王红斌1

作者信息

  • 1. 昆明理工大学 信息工程与自动化学院,云南昆明650500
  • 2. 昆明理工大学智能信息处理重点实验室,云南昆明650500
  • 折叠

摘要

Abstract

In this paper, a novel method based on word vector and graph models is proposed to deal with entity dis⁃ambiguation in specific topic domains. Take the tourism topic domain as an example. The method firstly chooses the web-pages of the tourism category in a Wikipedia offline database to build a knowledge base; then, the tool Word2Vec is used to build a word vector model with the texts in the knowledge base and texts taken from several tourism websites. Combined with a manual annotation graph, a random walk algorithm based on the graph is used to compute similarity to accurately calculate the similarity between words within the tourism domain. Next, the method extracts several keywords from the background text of the entity to be disambiguated and compares them with the keyword text in the knowledge base that describes the candidate entities. Finally, the method uses the trained Word2Vec model and graphical model to calculate the similarity between the keywords of name mention and the keywords of candidate entities. The method then chooses the candidate entities which have the maximum average similarity to the target entity. Experimental results show that this new method can effectively capture the similarity between name mention and a target entity;thus, it can accurately achieve entity disambiguation of a topic-specific domain.

关键词

实体消歧/实体链接/Word2Vec/图模型/随机游走/维基百科

Key words

entity disambiguation/entity linking/Word2Vec/Wikipedia/graphical model/random walking

分类

信息技术与安全科学

引用本文复制引用

汪沛,线岩团,郭剑毅,永华,陈玮,王红斌..一种结合词向量和图模型的特定领域实体消歧方法[J].智能系统学报,2016,11(3):366-375,10.

基金项目

国家自然科学基金项目(61262041,61472168,61462054,61562052);云南省自然科学基金重点项目(2013FA030). ()

智能系统学报

OA北大核心CSCDCSTPCD

1673-4785

访问量0
|
下载量0
段落导航相关论文