| 注册
首页|期刊导航|计算机工程与应用|基于协处理器的HBase二级索引方法

基于协处理器的HBase二级索引方法

郭红 周健倩 张瑛瑛 郭昆

计算机工程与应用2019,Vol.55Issue(21):86-91,6.
计算机工程与应用2019,Vol.55Issue(21):86-91,6.DOI:10.3778/j.issn.1002-8331.1807-0289

基于协处理器的HBase二级索引方法

Hbase Secondary Index Method Based on Coprocessor

郭红 1周健倩 2张瑛瑛 3郭昆1

作者信息

  • 1. 福州大学 数学与计算机科学学院,福州 350116
  • 2. 福建省网络计算与智能信息处理重点实验室,福州 350116
  • 3. 空间数据挖掘与信息共享教育部重点实验室,福州 350116
  • 折叠

摘要

Abstract

In the era of big data, massive unstructured data grows much faster than the structured data. HBase is widely used in massive unstructured data storage. The built-in index of HBase is designed on rowkeys. Therefore, the query performance of HBase is very high. However, when confronted with conditional queries which require full table scanning, the performance HBase degrades sharply. Hence, it cannot be applied directly to the real-time scenario. In order to solve the problem, a coprocessor-based HBase secondary indexing method is proposed. The method creates indices which map frequently queried fields to the row keys through the coprocessor of HBase. The indices are scanned in parallel to obtain the row keys, which are used to quickly locate the records. When the tables are created, the regions are pre-partitioned. When inserting data, a Hash values are added to the row keys. This can not only improve the insertion speed, but also avoid the phenomenon of hot data. It is guaranteed that the index data and the main data are located in the same regions, which can reduce one RPC request for each query. Experiments on simulated data sets show that the proposed method’s performance is quite competitive. It runs not only faster than HBase’s filter query, but also faster than the secondary index based on ElasticSearch. At the same time, the method’s space consumption is lower than the secondary index based on ElasticSearch.

关键词

HBase/二级索引/协处理器/ElasticSearch

Key words

HBase/secondary index/coprocessor/ElasticSearch

分类

信息技术与安全科学

引用本文复制引用

郭红,周健倩,张瑛瑛,郭昆..基于协处理器的HBase二级索引方法[J].计算机工程与应用,2019,55(21):86-91,6.

基金项目

国家自然科学基金(No.61300104,No.61300103,No.61672158) (No.61300104,No.61300103,No.61672158)

福建省高校杰出青年科学基金(No.JA12016) (No.JA12016)

福建省高等学校新世纪优秀人才支持计划(No.JA13021) (No.JA13021)

福建省杰出青年科学基金(No.2014J06017,No.2015J06014) (No.2014J06017,No.2015J06014)

福建省科技创新平台计划项目(No.2009J1007,No.2014H2005) (No.2009J1007,No.2014H2005)

福建省自然科学基金(No.2013J01230,No.2014J01232) (No.2013J01230,No.2014J01232)

福建省高校产学合作项目(No.2014H6014,No.2017H6008) (No.2014H6014,No.2017H6008)

海西政务大数据应用协同创新中心. ()

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文