| 注册
首页|期刊导航|计算机应用与软件|多策略中文微博实体词消歧及实体链接

多策略中文微博实体词消歧及实体链接

向宇 郭云龙 徐潇 曾维刚 李莉

计算机应用与软件2016,Vol.33Issue(8):12-17,61,7.
计算机应用与软件2016,Vol.33Issue(8):12-17,61,7.DOI:10.3969/j.issn.1000-386x.2016.08.003

多策略中文微博实体词消歧及实体链接

ENTITY WORDS DISAMBIGUATION AND ENTITY LINKING WITH MULTI-STRATEGY IN CHINESE MICROBLOGS

向宇 1郭云龙 1徐潇 1曾维刚 1李莉1

作者信息

  • 1. 西南大学计算机与信息科学学院 重庆 400715
  • 折叠

摘要

Abstract

Nowadays,the social networks are highly developing.How to disambiguate the microblogging entities with equivocal meaning and to link the entities to knowledge base have become the research focus at present.The paper proposes multiple strategic schemes in regard to entity disambiguation and entity linking.First it uses ICTCLAS to make word segmentation on microblogging texts,and uses Baidu Baike and entity expert database to normalise the entities.Then the paper uses Baidu Baike information,microblogging data and network terms caught by the web crawler to construct the disambiguation text database,and combines TF-IDF algorithm and Fast-Newman clustering algorithm to disambiguate and link the entities.We tested the data fetched from Chinese microblog entity linking task in 2rd Natural Language Processing &Chinese Computation conference (NLP&CC 2013).In the assessment the accuracy rate achieved 84.99%,and further achieved 91.40% after the constant improve of the model.

关键词

中文微博/实体消歧/TF-IDF/Fast-Newman 聚类

Key words

Chinese microblog/Entity disambiguation/TF-IDF/Fast-Newman clustering

分类

信息技术与安全科学

引用本文复制引用

向宇,郭云龙,徐潇,曾维刚,李莉..多策略中文微博实体词消歧及实体链接[J].计算机应用与软件,2016,33(8):12-17,61,7.

基金项目

国家自然科学基金项目(61170192)。 ()

计算机应用与软件

OACSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文