计算机工程Issue(10):43-46,51,5.DOI:10.3969/j.issn.1000-3428.2014.10.009
实时数据仓库中一种改进的数据流更新算法
An Improved Data Stream Update Algorithm in Real-time Data Warehouse
摘要
Abstract
To achieve data efficient integration in data warehouse, aiming at the phenomenon of data skew distribution,this paper proposes an improved data stream update algorithm---Extended Hybrid Join( EH-JOIN) . The algorithm improves the traditional Hash join method,and it can adapt to common skewed data and greatly reduce the disk I/O cost through using index structure and storing some parts of the master data in memory. Experimental results show that the service rate of proposed algorithm is improved by 96% and 80% compared with MESHJOIN algorithm and R-MESHJOUIN algorithm as the relation set keeps an appropriate size,and the service rate of proposed algorithm is improved by 57% and 48% compared with MESHJOIN algorithm and R-MESHJOUIN algorithm as the memory size differs.关键词
实时数据仓库/数据转换/数据流更新/基于流的连接/哈希索引/偏斜分布Key words
real-time data warehouse/data transformation/data stream update/stream-based join/Hash index/skewed distribution分类
信息技术与安全科学引用本文复制引用
潘郑冰,戴牡红..实时数据仓库中一种改进的数据流更新算法[J].计算机工程,2014,(10):43-46,51,5.基金项目
湖南省自然科学基金资助项目(2011FJ3034)。 (2011FJ3034)