自动化学报2016,Vol.42Issue(6):915-922,8.DOI:10.16383/j.aas.2016.c150715
基于词向量语义分类的微博实体链接方法
An Entity Linking Method for Microblog Based on Semantic Categorization by Word Embeddings
摘要
Abstract
As a widely applied task in natural language processing (NLP), named entity linking (NEL) is to link a given mention to an unambiguous entity in knowledge base. NEL plays an important role in information extraction and question answering. Since contents of microblog are short, traditional algorithms for long texts linking do not fit the microblog linking task well. Precious studies mostly constructed models based on mentions and its context to disambiguate entities, which are difficult to identify candidates with similar lexical and syntactic features. In this paper, we propose a novel NEL method based on semantic categorization through abstracting in terms of word embeddings, which can make full use of semantic involved in mentions and candidates. Initially, we get the word embeddings through neural network and cluster the entities as features. Then, the candidates are disambiguated through predicting the categories of entities by multiple classifiers. Lastly, we test the method on dataset of NLPCC2014, and draw the conclusion that the proposed method gets a better result than the best known work, especially on accurancy.关键词
词向量/实体链接/社会媒体处理/神经网络/多分类Key words
Word embedding/entity linking/social media processing/neural network/multiple classifiers引用本文复制引用
冯冲,石戈,郭宇航,龚静,黄河燕..基于词向量语义分类的微博实体链接方法[J].自动化学报,2016,42(6):915-922,8.基金项目
国家重点基础研究发展计划(973计划)(2013CB329303),国家高技术研究发展计划(863计划)(2015AA015404),国家自然科学基金(61502035),高等学校博士学科点专项科研基金(20121101120026)资助Supported by National Basic Research Program of China (973 Program)(2013CB329303), National High Technology Research and Development Program of China (863 Program)(2015AA015404), National Natural Science Foundation of China (61502035), and Specialized Research Fund for the Doctoral Program of Higher Education (20121101120026) (973计划)