计算机工程Issue(2):71-76,6.DOI:10.3969/j.issn.1000-3428.2014.02.016
基于64位体系结构的倒排索引压缩算法
Inverted Index Compression Algorithms Based on 64-bit Architecture
摘要
Abstract
In the 64-bit architecture of the CPU, the word length extends from 32 bit to 64 bit, and the data which CPU can process each time also increases to 64 bit. Few studies are performed to date to answer what influences 64-bit systems have on the compression and decompression of inverted index, which is the primary data structure in search engines. Some compression algorithms of posting lists work well on 32-bit machines, but are inefficient on 64-bit machines. This paper proposes three word-aligned compression algorithms on 64-bit system, namely, SimpleX64-16, SimpleX64-32 and SimpleX64-64. It adopts more modes and optimizes each mode for each algorithm. Experiments based on inverted index of GOV2 and ClueWeb09B show that those algorithms can improve compression ratio by 2.5% and decompression rate by 14.5%, compared with the traditional 32-bit word-aligned compression algorithms, on 64-bit machines.关键词
倒排索引/索引压缩/64 位体系结构/搜索引擎/信息检索Key words
inverted index/index compression/64-bit architecture/search engine/information retrieval分类
信息技术与安全科学引用本文复制引用
张旭东,孙志明,刘亚宁,单栋栋,闫宏飞..基于64位体系结构的倒排索引压缩算法[J].计算机工程,2014,(2):71-76,6.基金项目
国家自然科学基金资助项目(61272340,61073082);人人游戏基金资助项目(QXWJ-YX-201206017) (61272340,61073082)