计算机与数字工程2025,Vol.53Issue(3):718-724,7.DOI:10.3969/j.issn.1672-9722.2025.03.019
HSIT:一种针对海量数据的分布式相似性查询索引
HSIT:Distributed Similarity Query Index for Massive Data
摘要
Abstract
Similarity queries are commonly used in fields such as information retrieval,biology,and network security to ana-lyze associations between data.The traditional method to perform similarity query often requires the query point and each piece of da-ta in the database to be calculated.As the amount of data increases,the amount of computation increases linearly.In order to im-prove the efficiency of distributed similarity query of massive data,a HBase similarity index tree(HSIT)is proposed.In the process of data storage,the algorithm realizes the dynamic establishment of similarity index tree structure.The HSIT index can divide the similarity data according to the similarity threshold and store it in the adjacent area of HBase.When a user performs a similarity que-ry,the query node can quickly retrieve similar regions through HSIT.The index can achieve efficient pruning,so that only similar regions need to be calculated in pairs.The similarity query is performed through the exponential growth of 20 000 data to 1.28 mil-lion data.Compared with the DSCS-LTS algorithm,the experimental results show that the efficiency of the HSIT algorithm has been improved.关键词
海量数据/分布式/相似性查询/索引/HBaseKey words
massive data/distributed/similarity query/index/HBase分类
天文与地球科学引用本文复制引用
姚回,刘文..HSIT:一种针对海量数据的分布式相似性查询索引[J].计算机与数字工程,2025,53(3):718-724,7.基金项目
国家自然科学基金项目(编号:61962058) (编号:61962058)
新疆维吾尔自治区自然科学基金项目(编号:2019D01A30) (编号:2019D01A30)
数据工程与数字矿山联合实验室项目(编号:2019QX0035) (编号:2019QX0035)
新疆维吾尔自治区高校科研计划自然科学项目青年项目(编号:XJEDU2018Y056)资助. (编号:XJEDU2018Y056)