| 注册
首页|期刊导航|计算机与数字工程|HSIT:一种针对海量数据的分布式相似性查询索引

HSIT:一种针对海量数据的分布式相似性查询索引

姚回 刘文

计算机与数字工程2025,Vol.53Issue(3):718-724,7.
计算机与数字工程2025,Vol.53Issue(3):718-724,7.DOI:10.3969/j.issn.1672-9722.2025.03.019

HSIT:一种针对海量数据的分布式相似性查询索引

HSIT:Distributed Similarity Query Index for Massive Data

姚回 1刘文2

作者信息

  • 1. 新疆师范大学计算机科学技术学院 乌鲁木齐 830054
  • 2. 新疆工程学院控制工程学院 乌鲁木齐 830023
  • 折叠

摘要

Abstract

Similarity queries are commonly used in fields such as information retrieval,biology,and network security to ana-lyze associations between data.The traditional method to perform similarity query often requires the query point and each piece of da-ta in the database to be calculated.As the amount of data increases,the amount of computation increases linearly.In order to im-prove the efficiency of distributed similarity query of massive data,a HBase similarity index tree(HSIT)is proposed.In the process of data storage,the algorithm realizes the dynamic establishment of similarity index tree structure.The HSIT index can divide the similarity data according to the similarity threshold and store it in the adjacent area of HBase.When a user performs a similarity que-ry,the query node can quickly retrieve similar regions through HSIT.The index can achieve efficient pruning,so that only similar regions need to be calculated in pairs.The similarity query is performed through the exponential growth of 20 000 data to 1.28 mil-lion data.Compared with the DSCS-LTS algorithm,the experimental results show that the efficiency of the HSIT algorithm has been improved.

关键词

海量数据/分布式/相似性查询/索引/HBase

Key words

massive data/distributed/similarity query/index/HBase

分类

天文与地球科学

引用本文复制引用

姚回,刘文..HSIT:一种针对海量数据的分布式相似性查询索引[J].计算机与数字工程,2025,53(3):718-724,7.

基金项目

国家自然科学基金项目(编号:61962058) (编号:61962058)

新疆维吾尔自治区自然科学基金项目(编号:2019D01A30) (编号:2019D01A30)

数据工程与数字矿山联合实验室项目(编号:2019QX0035) (编号:2019QX0035)

新疆维吾尔自治区高校科研计划自然科学项目青年项目(编号:XJEDU2018Y056)资助. (编号:XJEDU2018Y056)

计算机与数字工程

1672-9722

访问量0
|
下载量0
段落导航相关论文