首页|期刊导航|计算机工程|基于Hadoop平台的事实并行处理算法

基于Hadoop平台的事实并行处理算法

孙莉何刚李继云

计算机工程Issue(3)：59-62,81,5.

计算机工程Issue(3)：59-62,81,5.DOI:10.3969/j.issn.1000-3428.2014.03.012

基于Hadoop平台的事实并行处理算法

Parallel Processing Algorithms for Facts Based on Hadoop Platform

孙莉 ¹何刚 ¹李继云¹

作者信息

1. 东华大学计算机科学与技术学院，上海 201620
折叠

摘要

Abstract

In view of that traditional Extract, Transform, Load(ETL) tools face the efficient problem of the massive fact data in data warehouse, two algorithms about parallel processing facts are designed and implemented based on Hadoop platform. From the two perspectives of surrogate key lookup of fact table and aggregation for fact data on the different granularity, a multi-way parallel lookup algorithm on slowly changing dimensions and an algorithm of aggregation for fact data on the different granularity are presented. The first algorithm considers slowly changing dimensions and big dimensions synthetically. In order to solve the problem of out of memory, the algorithm adopts an approach to the distributed cache to copy small dimensions to every date nodes’ memory. And implementing multi-way lookup of dimension keys in the stage of map is to avoid network delay result from data transmission. The second algorithm adds merge stage after reducing stage, so it is beneficial to solve the aggregation problem of the fact data according to different granularity effectively. Experimental results show that the two algorithms have better efficient than Hive data warehouse with respect to the problem of parallel processing facts data in data warehouse.

关键词

MapReduce模型/维度/事实/代理键/并行查找/聚合

Key words

MapReduce model/dimension/fact/surrogate key/parallel lookup/aggregation

分类

信息技术与安全科学

引用本文复制引用

孙莉,何刚,李继云..基于Hadoop平台的事实并行处理算法[J].计算机工程,2014,(3):59-62,81,5.

计算机工程

OA北大核心CSCDCSTPCD

ISSN：1000-3428

访问量0

下载量0

段落导航