| 注册
首页|期刊导航|计算机工程与应用|集成多种特征匹配中文实体名称

集成多种特征匹配中文实体名称

巩军

计算机工程与应用2012,Vol.48Issue(27):136-141,6.
计算机工程与应用2012,Vol.48Issue(27):136-141,6.DOI:10.3778/j.issn.1002-8331.2012.27.029

集成多种特征匹配中文实体名称

Matching Chinese entity names with multiple features

巩军1

作者信息

  • 1. 北京大学信息技术学院博士后流动站,北京100871;神华集团博士后工作站,北京100011
  • 折叠

摘要

Abstract

Entity name matching plays an important role in information system integration applications, while the name variations and clerical errors in Chinese entity names make exact string matching problematic. Therefore it is important to develop methodologies that can handle the different variants of the same name entity. The Chinese entity name similarity is measured based on character, word and semantic levels separately, and a hybrid solution is introduced by combining these similarities linearly. Two machine learning methods are developed to integrate editing features for more precise matching: the optimized ranking list and best cut point are achieved from a training process; a Support Vector Machine is used to judge the name pairs. The results of an experimental study on a real dataset of Chinese entity names are reported; the experiment results show the methods are effective.

关键词

字符串相似度/名字消歧/名字匹配/机器学习

Key words

string similarity/ name disambiguation/ name-matching/ machine learning

分类

信息技术与安全科学

引用本文复制引用

巩军..集成多种特征匹配中文实体名称[J].计算机工程与应用,2012,48(27):136-141,6.

计算机工程与应用

OACSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文