计算机工程2019,Vol.45Issue(3):20-25,31,7.DOI:10.19678/j.issn.1000-3428.0052626
分布式环境下时态大数据的连接操作研究
Research on Join Operation of Temporal Big Data in Distributed Environment
摘要
Abstract
Distributed system is an ideal choice for processing temporal large data join operation, but the existing distributed system cannot support the original temporal join query and cannot meet the processing requirements of temporal large data with low latency and high throughput.Therefore, a two-level index memory solution scheme based on Spark is proposed.The global index is used to prune the distributed partitions, and the local temporal index is used to query the partitions in order to improve the efficiency of data retrieval.A partition method is designed for temporal data to optimize global pruning.Experimental results based on real and synthetic datasets show that the scheme can significantly improve the processing efficiency of temporal join operation.关键词
时态大数据/分布式内存计算/时态连接/二级索引/分区方法/Spark框架Key words
temporal big data/distributed memory computing/temporal join/two-level index/partition method/Spark framework分类
信息技术与安全科学引用本文复制引用
张伟,王志杰..分布式环境下时态大数据的连接操作研究[J].计算机工程,2019,45(3):20-25,31,7.基金项目
国家自然科学基金(U1636210,61729202) (U1636210,61729202)
广东省科技计划项目(2015A030401057,2016B030307002). (2015A030401057,2016B030307002)