计算机工程与应用2016,Vol.52Issue(13):25-31,7.DOI:10.3778/j.issn.1002-8331.1601-0357
海量多维数据的存储与查询研究
Research on storage and query of large-scale multidimensional data.
摘要
Abstract
The OLAP(Online Analytical Processing) system built on warehouse is the most popular tool to analyze large-scale multidimensional data. With the development of information technology, data volume grows rapidly and data structure becomes more and more complicated, so the performance of OLAP system has dropped severely, failing to meet daily data analysis needs. This paper proposes new methods to store large-scale multidimensional data and perform aggre-gation query with Hadoop, a parallel computing system. The paper implements a new column-store format HCFile(HDFS column file), and proposals a new storage solution based on it. This project can improve the efficiency of aggregation, with a good scalability. Meanwhile, this paper leverages the hierarchy schema to build dimension hierarchy index, and uses MapReduce to perform efficiency aggregation query. Through comparison experiments with Hive, it proves that the proposed storage solution and aggregation query can effectively improve the efficiency of large-scale multidimensional data analysis.关键词
海量多维数据/Hadoop/数据索引/聚集查询Key words
large-scale multidimensional data/Hadoop/data index/aggregation query分类
信息技术与安全科学引用本文复制引用
宋爱波,万雨桐,贡欢,薛荧荧..海量多维数据的存储与查询研究[J].计算机工程与应用,2016,52(13):25-31,7.基金项目
国家自然科学基金(No.61370207,No.61572128) (No.61370207,No.61572128)
国家电网公司总部科技项目. ()