烟草科技2016,Vol.49Issue(4):96-102,7.DOI:10.16135/j.issn1002-0861.20160416
基于DW2.0的烟草海量数据分析系统的设计与实现
Design and implementation of DW2.0-based massive tobacco data analysis system
许建 1肖迎宾 1邢阳 1沈毅 1庄文杰1
作者信息
- 1. 江苏中烟工业有限责任公司信息中心,南京市中山北路406-3号 210011
- 折叠
摘要
Abstract
To solve the problems associated with traditional data warehouse system in data storage, processing and presentation, a massive tobacco data analysis system was designed by taking the practical application of China Tobacco Jiangsu Industrial Limited Corporation into account. An integrated and coordinated data warehouse was configured via referring to DW2.0 theory and big data application technology, introducing distributive processing architecture and fusing traditional data warehouse with Hadoop. The system’s high response ability was achieved by data lifecycle management. Unstructured data were processed by Hadoop HBase, and the parallel computing framework of Hadoop MapReduce was used as the communication layer to schedule and coordinate the computing and communication at nodes in clusters. The test results indicated that comparing with traditional methods, the response time of the new system was promoted by 30% and 80% when the magnitude of data reached 100 million and 1 billion, respectively. It effectively improved the application level of data warehouse system.关键词
烟草/海量数据/数据仓库/DW2.0理论/Hadoop架构/数据生命周期Key words
Tobacco/Massive data/Data warehouse/DW2.0 theory/Hadoop architecture/Data lifecycle分类
轻工纺织引用本文复制引用
许建,肖迎宾,邢阳,沈毅,庄文杰..基于DW2.0的烟草海量数据分析系统的设计与实现[J].烟草科技,2016,49(4):96-102,7.