| 注册
首页|期刊导航|计算机工程与应用|双向过滤的字符串相似连接验证方法

双向过滤的字符串相似连接验证方法

黄樱 宋春花 牛保宁

计算机工程与应用2017,Vol.53Issue(9):72-79,8.
计算机工程与应用2017,Vol.53Issue(9):72-79,8.DOI:10.3778/j.issn.1002-8331.1512-0309

双向过滤的字符串相似连接验证方法

Verification method for string similarity joins based on bi-directional filtering

黄樱 1宋春花 1牛保宁1

作者信息

  • 1. 太原理工大学 计算机学院,太原 030024
  • 折叠

摘要

Abstract

A string similarity join finds similar string pairs from two sets of strings. It plays an important role in many real-world applications. Various algorithms have been proposed to address its efficiency issues. Partition-based filter-veri-fication methods, such as Pass-Join, are promising, which quickly screens out possible similar string pairs(candidate set)by searching partitioned parts of a string in another string, in order of increasing length, and then performs similarity verification based on edit-distance. Motivated by the fact that the effect produced by filtering in the descending order of string length is better than in the ascending order, a novel bi-directional filtering-verification mechanism is proposed. At the filtering stage, it pipelines the results from length descending filtering to length ascending filtering to further reduce the size of the candidate set. At the verification stage, it makes use of the two pairs of matched substrings from the bi-directional filtering to partition the target string pairs into several short substring pairs to accelerate the verification process. Experi-mental results show that the proposed bi-directional filtering-verification algorithm outperforms the origin algorithm on real-world datasets.

关键词

字符串相似连接/双向过滤-验证机制/过滤-验证框架

Key words

string similarity joins/bi-directional filtering-verification mechanism/filter-verification framework

分类

信息技术与安全科学

引用本文复制引用

黄樱,宋春花,牛保宁..双向过滤的字符串相似连接验证方法[J].计算机工程与应用,2017,53(9):72-79,8.

基金项目

国家科技支撑项目课题(No.2012BAH04F02) (No.2012BAH04F02)

人社部留学人员科技活动项目(No.2011-508). (No.2011-508)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文