| 注册
首页|期刊导航|电子学报|M2LSH:基于LSH的高维数据近似最近邻查找算法

M2LSH:基于LSH的高维数据近似最近邻查找算法

李灿 钱江波 董一鸿 陈华辉

电子学报2017,Vol.45Issue(6):1431-1442,12.
电子学报2017,Vol.45Issue(6):1431-1442,12.DOI:10.3969/j.issn.0372-2112.2017.06.022

M2LSH:基于LSH的高维数据近似最近邻查找算法

M2LSH:An LSH Based Technique for Approximate Nearest Neighbor Searching on High Dimensional Data

李灿 1钱江波 1董一鸿 1陈华辉1

作者信息

  • 1. 宁波大学信息科学与工程学院,浙江宁波 315211
  • 折叠

摘要

Abstract

The LSH (Locality Sensitive Hashing) method and its variants represent the state-of-the-art indexing techniques to process ANN (Approximate Nearest Neighbor) searches in a high dimensional data space.Although the existing LSH based techniques can efficiently deal with uniform distributed data,it is noticed from the principle of their design schemes that these techniques cannot handle non-uniform distributed data well.To tackle the above problem,this paper presents a new LSH based technique,called M2LSH (2 Layers Merging LSH),to efficiently process ANN searches on non-uniform distributed data.The key idea is as follows.First,data objects are stored in a hash table with multiple buckets (i.e.,the first layer),each of which corresponds to a combined hash vector used as its hash number.The count of data objects in each hash bucket is recorded.Secondly,using the hash functions for a second layer hash table,the bucket numbers from the first layer are projected into a one-dimensional space.In this space,some adjacent hash buckets from the first layer are merged so as to make the data objects uniformly distributed in the merged buckets at the second layer.Therefore,the M2LSH can not only improve the searching efficiency but also increase the accuracy of the search results.This paper also provides a detailed theoretical accuracy analysis for the M2LSH technique.Experiments using synthetic and real data show that the theoretical estimates are quite accurate,and the proposed M2LSH technique can efficiently process ANN searches with low false positive and negative rates.

关键词

近似最近邻/KNN查询/局部敏感哈希/高维数据

Key words

approximate nearest neighbor (ANN)/k-nearest neighbor (KNN) search/locality sensitive hashing (LSH)/high dimensionality

分类

信息技术与安全科学

引用本文复制引用

李灿,钱江波,董一鸿,陈华辉..M2LSH:基于LSH的高维数据近似最近邻查找算法[J].电子学报,2017,45(6):1431-1442,12.

基金项目

国家自然科学基金(No.61472194,No.61572266) (No.61472194,No.61572266)

浙江省自然科学基金(No.LY13F020040,No.LY16F020003) (No.LY13F020040,No.LY16F020003)

宁波市自然科学基金(No.2014A610023,No.2015A610119) (No.2014A610023,No.2015A610119)

电子学报

OA北大核心CSCDCSTPCD

0372-2112

访问量0
|
下载量0
段落导航相关论文