| 注册
首页|期刊导航|软件导刊|基于前缀剪枝的大规模向量空间相似检索框架

基于前缀剪枝的大规模向量空间相似检索框架

刘健博 邓凌风 李文海 田野

软件导刊2024,Vol.23Issue(6):92-97,6.
软件导刊2024,Vol.23Issue(6):92-97,6.DOI:10.11907/rjdk.241015

基于前缀剪枝的大规模向量空间相似检索框架

A Large-Scale Vector Space Similarity Retrieval Framework Based on Prefix Pruning

刘健博 1邓凌风 2李文海 2田野3

作者信息

  • 1. 武汉数博科技有限责任公司,湖北 武汉 430205
  • 2. 武汉大学 计算机学院,湖北 武汉 430072
  • 3. 湖北开放大学 软件工程学院,湖北 武汉 430074
  • 折叠

摘要

Abstract

Aiming at the problem of weight-based similarity query under large-scale text collection,an efficient retrieval framework supporting prefix pruning is proposed.Firstly,we give the definition of similarity and its weighted prefix under the vector space model,and theoretically prove the correctness of weighted prefix pruning;then,for large-scale text query,we propose a new inverted index structure,use the index leaf nodes to maintain the prefix weights of the records,and construct efficient similarity retrieval algorithms based on the index;finally,we prove that the meth-od can effectively support large-scale similar retrieval with weights,and the results show that its query efficiency is more than 5 times higher than that of Lucene's subsumption verification strategy.

关键词

前缀剪枝/TF/IDF/向量空间/倒排索引/信息检索/数据库

Key words

prefix-based pruning/TF/IDF/vector space model/inverted index/information retrieval/database

分类

信息技术与安全科学

引用本文复制引用

刘健博,邓凌风,李文海,田野..基于前缀剪枝的大规模向量空间相似检索框架[J].软件导刊,2024,23(6):92-97,6.

基金项目

武汉市重点研发计划项目(2023010402040006) (2023010402040006)

软件导刊

1672-7800

访问量0
|
下载量0
段落导航相关论文