计算机工程与应用Issue(6):106-111,6.DOI:10.3778/j.issn.1002-8331.1309-0423
基于层次聚类的跨文本中文人名消歧研究
Cross-document Chinese personal name entity disambiguation based on hierarchical clustering
张菲菲 1李宗海 2周晓辉 1李晓戈1
作者信息
- 1. 西安邮电大学,西安 710121
- 2. 济南中林信息科技有限公司,济南 250100
- 折叠
摘要
Abstract
Cross-document entity disambiguation is the problem of identifying whether mentions from different documents refer to the same or distinct entities. This paper describes a Chinese information extraction system which involves both document-level IE and corpus-level IE, a pipeline and multi-level modular approach to name entity and Entity Profile extrac-tion. It introduces novel features based on document-level entity profiles and study on the influence of feature selection, parameter selection, parameter validation and analysis on results. Disambiguation is performed based on agglomerative hier-archical clustering using Hadoop. Experiments show that F-measure of training set is 91.33% and testing set is 88.73%, using the whole network news corpus dataset from Harbin Institute of Technology.关键词
人名消歧/信息抽取/相似度/层次聚类Key words
entity disambiguation/information extraction/similarity/hierarchical clustering分类
信息技术与安全科学引用本文复制引用
张菲菲,李宗海,周晓辉,李晓戈..基于层次聚类的跨文本中文人名消歧研究[J].计算机工程与应用,2014,(6):106-111,6.