| 注册
首页|期刊导航|数据采集与处理|基于重采样和集成选择的适用于实体识别的多分类器系统

基于重采样和集成选择的适用于实体识别的多分类器系统

周星 刁兴春 曹建军 李鑫 王芳潇

数据采集与处理2017,Vol.32Issue(5):931-938,8.
数据采集与处理2017,Vol.32Issue(5):931-938,8.DOI:10.16337/j.1004-9037.2017.05.010

基于重采样和集成选择的适用于实体识别的多分类器系统

Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection

周星 1刁兴春 2曹建军 2李鑫 2王芳潇2

作者信息

  • 1. 解放军理工大学指挥信息系统学院,南京,210007
  • 2. 南京电讯技术研究所,南京,210007
  • 折叠

摘要

Abstract

Classifiers are often used in entity resolution to classify record pairs into matches,non-matches and possible matches based on field similarity vector,in which case,the performance of classifiers is directly related to the performance of entity resolution.To improve the accuracy of classifier,a multiple classifier system is constructed.We make full use of the characters of entity resolution to distinguish the ambiguous instances before classification,vary the resampling ratio to generate a group of resampled instances,and use the resampled instances to train classifiers for constructing a parallel multiple classifier system.Moreover,we emphasize on the diversity and sparsity between classifiers to select the best classifier subset,and use non-linear programming and extreme value to solute the ensemble selection problem,respectively.Empirical experiments show the proposed multiple classifier system is superior to the stateof-art ones in accuracy due to resampling and ensemble selection.

关键词

实体识别/多分类器系统/重采样/集成选择/差异度

Key words

entity resolution/multiple classifier system/resampling/ensemble selection/diversity

分类

信息技术与安全科学

引用本文复制引用

周星,刁兴春,曹建军,李鑫,王芳潇..基于重采样和集成选择的适用于实体识别的多分类器系统[J].数据采集与处理,2017,32(5):931-938,8.

基金项目

国家自然科学基金(61371196)资助项目 (61371196)

中国博士后科学基金(201003797)特别资助项目 (201003797)

解放军理工大学预研基金(20110604,41150301)资助项目. (20110604,41150301)

数据采集与处理

OA北大核心CSCDCSTPCD

1004-9037

访问量0
|
下载量0
段落导航相关论文