| 注册
首页|期刊导航|计算机工程|基于64位体系结构的倒排索引压缩算法

基于64位体系结构的倒排索引压缩算法

张旭东 孙志明 刘亚宁 单栋栋 闫宏飞

计算机工程Issue(2):71-76,6.
计算机工程Issue(2):71-76,6.DOI:10.3969/j.issn.1000-3428.2014.02.016

基于64位体系结构的倒排索引压缩算法

Inverted Index Compression Algorithms Based on 64-bit Architecture

张旭东 1孙志明 2刘亚宁 1单栋栋 1闫宏飞1

作者信息

  • 1. 北京大学网络与信息系统研究所,北京 100871
  • 2. 哈尔滨医科大学附属第一医院信息中心,哈尔滨 150001
  • 折叠

摘要

Abstract

In the 64-bit architecture of the CPU, the word length extends from 32 bit to 64 bit, and the data which CPU can process each time also increases to 64 bit. Few studies are performed to date to answer what influences 64-bit systems have on the compression and decompression of inverted index, which is the primary data structure in search engines. Some compression algorithms of posting lists work well on 32-bit machines, but are inefficient on 64-bit machines. This paper proposes three word-aligned compression algorithms on 64-bit system, namely, SimpleX64-16, SimpleX64-32 and SimpleX64-64. It adopts more modes and optimizes each mode for each algorithm. Experiments based on inverted index of GOV2 and ClueWeb09B show that those algorithms can improve compression ratio by 2.5% and decompression rate by 14.5%, compared with the traditional 32-bit word-aligned compression algorithms, on 64-bit machines.

关键词

倒排索引/索引压缩/64 位体系结构/搜索引擎/信息检索

Key words

inverted index/index compression/64-bit architecture/search engine/information retrieval

分类

信息技术与安全科学

引用本文复制引用

张旭东,孙志明,刘亚宁,单栋栋,闫宏飞..基于64位体系结构的倒排索引压缩算法[J].计算机工程,2014,(2):71-76,6.

基金项目

国家自然科学基金资助项目(61272340,61073082);人人游戏基金资助项目(QXWJ-YX-201206017) (61272340,61073082)

计算机工程

OA北大核心CSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文