| 注册
首页|期刊导航|计算机应用研究|基于长度过滤和动态容错的SNM改进算法

基于长度过滤和动态容错的SNM改进算法

刘雅思 程力 李晓

计算机应用研究2017,Vol.34Issue(1):147-150,155,5.
计算机应用研究2017,Vol.34Issue(1):147-150,155,5.DOI:10.3969/j.issn.1001-3695.2017.01.031

基于长度过滤和动态容错的SNM改进算法

Improved SNM algorithm based on length filtering and dynamic fault-tolerance

刘雅思 1程力 2李晓3

作者信息

  • 1. 中国科学院新疆理化技术研究所,乌鲁木齐 830011
  • 2. 新疆民族语音语言信息处理实验室,乌鲁木齐830011
  • 3. 中国科学院大学 计算机与控制学院,北京 100049
  • 折叠

摘要

Abstract

In data warehouse systems,cleaning similar and duplicated records could effectively impact data quality.Tradi-tional SNM(sorted-neighborhood method)has performance issues with time efficiency and accuracy rate.In order to improve its performance,this paper proposed an enhance SNM algorithm based on length filtering and dynamic fault-tolerance (LF-SNM).Firstly,it improved the detection efficiency by excluding the records which were impossible to be duplicated according to the length proportion and attribute absence of two records.Then,it calibrated field similarity results using dynamic fault-to-lerance method.It ensured accuracy even though some attributes were absent.Experimental results indicate that the LF-SNM performs obviously better than traditional SNM method on actual datasets under the same experimental conditions.

关键词

数据清洗/相似重复记录/SNM算法/动态容错/字段匹配

Key words

data cleaning/similar and duplicated records/SNM algorithm/dynamic fault-tolerance/string match

分类

信息技术与安全科学

引用本文复制引用

刘雅思,程力,李晓..基于长度过滤和动态容错的SNM改进算法[J].计算机应用研究,2017,34(1):147-150,155,5.

基金项目

新疆维吾尔自治区青年科技创新人才培养工程基金资助项目(2014721033);乌鲁木齐高新区发展扶持基金资助项目 ()

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文