计算机工程与应用2012,Vol.48Issue(27):136-141,6.DOI:10.3778/j.issn.1002-8331.2012.27.029
集成多种特征匹配中文实体名称
Matching Chinese entity names with multiple features
巩军1
作者信息
- 1. 北京大学信息技术学院博士后流动站,北京100871;神华集团博士后工作站,北京100011
- 折叠
摘要
Abstract
Entity name matching plays an important role in information system integration applications, while the name variations and clerical errors in Chinese entity names make exact string matching problematic. Therefore it is important to develop methodologies that can handle the different variants of the same name entity. The Chinese entity name similarity is measured based on character, word and semantic levels separately, and a hybrid solution is introduced by combining these similarities linearly. Two machine learning methods are developed to integrate editing features for more precise matching: the optimized ranking list and best cut point are achieved from a training process; a Support Vector Machine is used to judge the name pairs. The results of an experimental study on a real dataset of Chinese entity names are reported; the experiment results show the methods are effective.关键词
字符串相似度/名字消歧/名字匹配/机器学习Key words
string similarity/ name disambiguation/ name-matching/ machine learning分类
信息技术与安全科学引用本文复制引用
巩军..集成多种特征匹配中文实体名称[J].计算机工程与应用,2012,48(27):136-141,6.