计算机工程与科学2024,Vol.46Issue(7):1193-1201,9.DOI:10.3969/j.issn.1007-130X.2024.07.007
基于分区层次图的海量高维数据学习索引构建方法
Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph
华悦琳 1周晓磊 2范强 2王芳潇 2严浩2
作者信息
- 1. 南京信息工程大学计算机学院、网络空间安全学院,江苏 南京 210044||国防科技大学第六十三研究所,江苏 南京 210007||国防科技大学大数据与决策实验室,湖南 长沙 410073
- 2. 国防科技大学第六十三研究所,江苏 南京 210007||国防科技大学大数据与决策实验室,湖南 长沙 410073
- 折叠
摘要
Abstract
Learning to index is the key to solving the problem of approximate nearest neighbor search in massive high-dimensional data.However,existing learning to index techniques are limited to individ-ual partitions and rely on the construction of neighborhood graph.As the dimensionality and scale of da-ta grow,indexing struggles to accurately judge boundary data of partitions,leading to increased con-struction time complexity and challenges in scalability.To address the above problems,a learn to index method based on partitioned hierarchical graphs,PBO-HNSW is proposed.The method redistributes partition boundary data and constructs distributed graph index structures in parallel.It effectively ad-dresses the challenges faced by the approximate nearest neighbor search problem.Experimental results show that PBO-HNSW method is able to achieve millisecond index construction on millions of massive high-dimensional data.When the recall is 0.93,the construction time of the PBO-HNSW method is only 36.4%of baseline methods.关键词
近似最近邻搜索/学习索引/层次可导航小世界图/分区学习/索引结构Key words
approximate nearest neighbor search/learning to index/hierarchical navigable small world(HNSW)/partition learning/index structure分类
信息技术与安全科学引用本文复制引用
华悦琳,周晓磊,范强,王芳潇,严浩..基于分区层次图的海量高维数据学习索引构建方法[J].计算机工程与科学,2024,46(7):1193-1201,9.