计算机应用研究2018,Vol.35Issue(3):689-693,5.DOI:10.3969/j.issn.1001-3695.2018.03.011
基于集成分类的高维数据实体分辨
High-dimensional data entity resolution based on ensemble classifying
摘要
Abstract
In order to effectively use rich information to improve performance of entity resolution in high-dimensional data,this paper proposed a random combinational ensemble classifiers' model.It defined the base classifier's classification performance's indicators,used the classification success rate and feature's number as two objects for optimizing base classifier,and adopted an aggregation function to transform them into a single objective optimization problem.It applied ant colony optimization to design base classifier,and adopted maximal information coefficient to measure correlation between features as heuristic information.The ensemble classifiers were composed of base classifiers which had the best diversity evaluated by Tanimoto distance,and used voting way to decide the output of ensemble classifiers.This paper adopts some benchmark datasets to evaluate the method,and the results show the effectiveness of the method.关键词
实体分辨/高维数据/集成分类器/蚁群优化/最大信息系数Key words
entity resolution/high-dimensional data/ensemble classifiers/ant colony optimization/maximal information coefficient分类
信息技术与安全科学引用本文复制引用
刘艺,刁兴春,曹建军,尚玉玲..基于集成分类的高维数据实体分辨[J].计算机应用研究,2018,35(3):689-693,5.基金项目
国家自然科学基金资助项目(61371196) (61371196)
中国博士后科学基金特别资助项目(201003797) (201003797)
解放军理工大学预研基金资助项目(20110604,41150301) (20110604,41150301)