计算机科学与探索2011,Vol.5Issue(2):161-169,9.DOI:10.3778/j.issn.1673-9418.2011.02.006
面向MapReduce的数据处理流程开发方法
Development Method of MapReduce Oriented Data Flow Processing
摘要
Abstract
In the age of information explosion, DataFlow processing widely existed and has shown new features and styles including massive and parallel, meanwhile more and more people choose to use MapReduce to process their data because of its simplicity and higher capability with lower cost, but MapReduce does not directly support complex N-step, N-branch and multiple data sets data flow processing. This paper proposes a model-driven development method for DataFlow processing based on MapReduce. It first defines the logical and physical models of the dataflow as well as the component model, then designs model transfer and code generation algorithms, finally uses the algorithms to generate the MapReduce program code which implements the function defined by the logical model and can run on Hadoop platform. Based on this method, a development tool CloudDataFlow is implemented. As the experiment shows, compared with similar system, it has higher performance, extendibility and usability.关键词
MapReduce/数据处理流程/模型驱动/Hadoop平台分类
信息技术与安全科学引用本文复制引用
易小华,刘杰,叶丹..面向MapReduce的数据处理流程开发方法[J].计算机科学与探索,2011,5(2):161-169,9.基金项目
The Major National Science and Technology Special Projects High-Nuclear-Based Project of China under Grant No.2009ZX01043-003-002(国家科技重大专项核高基项目) (国家科技重大专项核高基项目)
the National Science and Technology Support Program of China under Grant No.2009BAG18B01,2009BADA9B02(国家科技支撑计划). (国家科技支撑计划)