| 注册
首页|期刊导航|计算机工程|分布式环境下时态大数据的连接操作研究

分布式环境下时态大数据的连接操作研究

张伟 王志杰

计算机工程2019,Vol.45Issue(3):20-25,31,7.
计算机工程2019,Vol.45Issue(3):20-25,31,7.DOI:10.19678/j.issn.1000-3428.0052626

分布式环境下时态大数据的连接操作研究

Research on Join Operation of Temporal Big Data in Distributed Environment

张伟 1王志杰2

作者信息

  • 1. 上海交通大学 计算机科学与工程系,上海 200240
  • 2. 中山大学 数据科学与计算机学院,广州 510006
  • 折叠

摘要

Abstract

Distributed system is an ideal choice for processing temporal large data join operation, but the existing distributed system cannot support the original temporal join query and cannot meet the processing requirements of temporal large data with low latency and high throughput.Therefore, a two-level index memory solution scheme based on Spark is proposed.The global index is used to prune the distributed partitions, and the local temporal index is used to query the partitions in order to improve the efficiency of data retrieval.A partition method is designed for temporal data to optimize global pruning.Experimental results based on real and synthetic datasets show that the scheme can significantly improve the processing efficiency of temporal join operation.

关键词

时态大数据/分布式内存计算/时态连接/二级索引/分区方法/Spark框架

Key words

temporal big data/distributed memory computing/temporal join/two-level index/partition method/Spark framework

分类

信息技术与安全科学

引用本文复制引用

张伟,王志杰..分布式环境下时态大数据的连接操作研究[J].计算机工程,2019,45(3):20-25,31,7.

基金项目

国家自然科学基金(U1636210,61729202) (U1636210,61729202)

广东省科技计划项目(2015A030401057,2016B030307002). (2015A030401057,2016B030307002)

计算机工程

OA北大核心CSCDCSTPCD

1000-3428

访问量5
|
下载量0
段落导航相关论文