重庆理工大学学报(自然科学版)2016,Vol.30Issue(4):91-96,6.DOI:10.3969/j.issn.1674-8425(z).2016.04.016
基于 SNM 改进算法的相似重复记录消除
Research on Eliminating Duplicate Records Based on SNM Improved Algorithm
摘要
Abstract
High quality data is the most important factor to build the data warehouse. The low quality data may be bad for decision maxing. An approximately duplicate record from different data sources is one of the main data quality issues to build data warehouse. To eliminate approximately duplicate data as far as possible before the source data enters into a data warehouse can greatly improve the quality of data. Firstly,the existing approximately duplicate records elimination algorithms were compared,and then SNM algorithm was improved. The authors compared traditional SNM method and SNM improved algorithm by the experiment,and the results show:SNM improved algorithm has obvious advantages in eliminating duplicate records.关键词
SNM算法/SNM改进算法/相似重复记录消除Key words
SNM algorithm/SNM improved algorithm/approximately duplicate records elimination分类
信息技术与安全科学引用本文复制引用
余肖生,胡孙枝..基于 SNM 改进算法的相似重复记录消除[J].重庆理工大学学报(自然科学版),2016,30(4):91-96,6.基金项目
国家自然科学基金资助项目 ()