计算机应用与软件2016,Vol.33Issue(8):12-17,61,7.DOI:10.3969/j.issn.1000-386x.2016.08.003
多策略中文微博实体词消歧及实体链接
ENTITY WORDS DISAMBIGUATION AND ENTITY LINKING WITH MULTI-STRATEGY IN CHINESE MICROBLOGS
摘要
Abstract
Nowadays,the social networks are highly developing.How to disambiguate the microblogging entities with equivocal meaning and to link the entities to knowledge base have become the research focus at present.The paper proposes multiple strategic schemes in regard to entity disambiguation and entity linking.First it uses ICTCLAS to make word segmentation on microblogging texts,and uses Baidu Baike and entity expert database to normalise the entities.Then the paper uses Baidu Baike information,microblogging data and network terms caught by the web crawler to construct the disambiguation text database,and combines TF-IDF algorithm and Fast-Newman clustering algorithm to disambiguate and link the entities.We tested the data fetched from Chinese microblog entity linking task in 2rd Natural Language Processing &Chinese Computation conference (NLP&CC 2013).In the assessment the accuracy rate achieved 84.99%,and further achieved 91.40% after the constant improve of the model.关键词
中文微博/实体消歧/TF-IDF/Fast-Newman 聚类Key words
Chinese microblog/Entity disambiguation/TF-IDF/Fast-Newman clustering分类
信息技术与安全科学引用本文复制引用
向宇,郭云龙,徐潇,曾维刚,李莉..多策略中文微博实体词消歧及实体链接[J].计算机应用与软件,2016,33(8):12-17,61,7.基金项目
国家自然科学基金项目(61170192)。 ()