| 注册
首页|期刊导航|计算机与现代化|基于CRF的分区倒排索引压缩算法

基于CRF的分区倒排索引压缩算法

王子琛 瞿有利

计算机与现代化Issue(2):36-42,55,8.
计算机与现代化Issue(2):36-42,55,8.DOI:10.3969/j.issn.1006-2475.2024.02.006

基于CRF的分区倒排索引压缩算法

A Partition Inverted Index Compression Algorithm Based on CRF

王子琛 1瞿有利1

作者信息

  • 1. 北京交通大学计算机与信息技术学院,北京 100044
  • 折叠

摘要

Abstract

The inverted index is the core data structure of a large search engine,which is essentially a collection of integer se-quences in an inverted table.Inverted index compression can effectively reduce the space occupied by inverted indexes and im-prove retrieval efficiency of keywords.The partition inverted index compression algorithm based on conditional random field(CRF)mainly focuses on the partition mode of universe partition.The algorithm pre-partitions the sequence and uses conditional random fields to label and reorganize the pre-partitions,which effectively reduces the compression time.According to the parti-tion type,the algorithm uses the corresponding encoding method to further reduce the space occupation after compression.Com-pared with other inverted index compression algorithms,this algorithm outperforms some current universe partition algorithms in compression ratio,and is equivalent to other universes partition algorithms in decompression time.The algorithm achieves a good balance between time and space.

关键词

倒排索引/数据压缩/域值分区/条件随机场/搜索引擎

Key words

inverted index/data compression/universe partition/conditional random field/search engines

分类

信息技术与安全科学

引用本文复制引用

王子琛,瞿有利..基于CRF的分区倒排索引压缩算法[J].计算机与现代化,2024,(2):36-42,55,8.

基金项目

国家自然科学基金资助项目(61976015) (61976015)

计算机与现代化

OACSTPCD

1006-2475

访问量0
|
下载量0
段落导航相关论文