| 注册
首页|期刊导航|计算机工程与应用|基于层次聚类的跨文本中文人名消歧研究

基于层次聚类的跨文本中文人名消歧研究

张菲菲 李宗海 周晓辉 李晓戈

计算机工程与应用Issue(6):106-111,6.
计算机工程与应用Issue(6):106-111,6.DOI:10.3778/j.issn.1002-8331.1309-0423

基于层次聚类的跨文本中文人名消歧研究

Cross-document Chinese personal name entity disambiguation based on hierarchical clustering

张菲菲 1李宗海 2周晓辉 1李晓戈1

作者信息

  • 1. 西安邮电大学,西安 710121
  • 2. 济南中林信息科技有限公司,济南 250100
  • 折叠

摘要

Abstract

Cross-document entity disambiguation is the problem of identifying whether mentions from different documents refer to the same or distinct entities. This paper describes a Chinese information extraction system which involves both document-level IE and corpus-level IE, a pipeline and multi-level modular approach to name entity and Entity Profile extrac-tion. It introduces novel features based on document-level entity profiles and study on the influence of feature selection, parameter selection, parameter validation and analysis on results. Disambiguation is performed based on agglomerative hier-archical clustering using Hadoop. Experiments show that F-measure of training set is 91.33% and testing set is 88.73%, using the whole network news corpus dataset from Harbin Institute of Technology.

关键词

人名消歧/信息抽取/相似度/层次聚类

Key words

entity disambiguation/information extraction/similarity/hierarchical clustering

分类

信息技术与安全科学

引用本文复制引用

张菲菲,李宗海,周晓辉,李晓戈..基于层次聚类的跨文本中文人名消歧研究[J].计算机工程与应用,2014,(6):106-111,6.

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文