首页|期刊导航|计算机工程与应用|基于层次聚类的跨文本中文人名消歧研究

基于层次聚类的跨文本中文人名消歧研究

张菲菲李宗海周晓辉李晓戈

计算机工程与应用Issue(6)：106-111,6.

计算机工程与应用Issue(6)：106-111,6.DOI:10.3778/j.issn.1002-8331.1309-0423

基于层次聚类的跨文本中文人名消歧研究

Cross-document Chinese personal name entity disambiguation based on hierarchical clustering

张菲菲 ¹李宗海 ²周晓辉 ¹李晓戈¹

作者信息

1. 西安邮电大学，西安 710121
2. 济南中林信息科技有限公司，济南 250100
折叠

摘要

Abstract

Cross-document entity disambiguation is the problem of identifying whether mentions from different documents refer to the same or distinct entities. This paper describes a Chinese information extraction system which involves both document-level IE and corpus-level IE, a pipeline and multi-level modular approach to name entity and Entity Profile extrac-tion. It introduces novel features based on document-level entity profiles and study on the influence of feature selection, parameter selection, parameter validation and analysis on results. Disambiguation is performed based on agglomerative hier-archical clustering using Hadoop. Experiments show that F-measure of training set is 91.33% and testing set is 88.73%, using the whole network news corpus dataset from Harbin Institute of Technology.

关键词

人名消歧/信息抽取/相似度/层次聚类

Key words

entity disambiguation/information extraction/similarity/hierarchical clustering

分类

信息技术与安全科学

引用本文复制引用

张菲菲,李宗海,周晓辉,李晓戈..基于层次聚类的跨文本中文人名消歧研究[J].计算机工程与应用,2014,(6):106-111,6.

计算机工程与应用

OA北大核心CSCDCSTPCD

ISSN：1002-8331

访问量0

下载量0

段落导航