首页|期刊导航|电子学报|并行计算框架Spark的自适应缓存管理策略

并行计算框架Spark的自适应缓存管理策略

卞琛于炯英昌甜修位蓉

电子学报2017，Vol.45Issue(2)：278-284,7.

电子学报2017，Vol.45Issue(2)：278-284,7.DOI:10.3969/j.issn.0372-2112.2017.02.003

并行计算框架Spark的自适应缓存管理策略

Self-Adaptive Strategy for Cache Management in Spark

卞琛 ¹于炯 ²英昌甜 ¹修位蓉¹

作者信息

1. 新疆大学信息科学与工程学院,新疆乌鲁木齐830046
2. 乌鲁木齐职业大学信息工程学院,新疆乌鲁木齐830002
折叠

摘要

Abstract

As a parallel computation framework,Spark does not have a good strategy to select valuable RDD to cache in limited memory.When memory has been full load,Spark will discard the least recently used RDD while ignoring other factors such as the computation cost and so on.This paper proposed a self-adaptive cache management strategy (SACM),which comprised of automatic selection algorithm (Selection),parallel cache cleanup algorithm (PCC) and lowest weight replacement algorithm (LWR).Selection algorithm can seek valuable RDDs and cache their partitions to speed up data intensive computations.PCC clean-up the valueless RDD sasynchronously to improve memory utilization.LWR takes comprehensive consideration of the usage frequency of RDD,the RDD's computation cost,and the size of RDD.Experiment results show that Spark with our selection algorithm calculates faster than traditional Spark,parallel cleanup algorithm contributes to the improvement of memory utilization,and LWR shows better performance in limited memory.

关键词

并行计算/缓存管理策略/Spark/弹性分布式数据集

Key words

parallel computing/cache management strategy/Spark/resilient distribution datasets

分类

信息技术与安全科学

引用本文复制引用

卞琛,于炯,英昌甜,修位蓉..并行计算框架Spark的自适应缓存管理策略[J].电子学报,2017,45(2):278-284,7.

基金项目

国家自然科学基金(No.61262088,No.61462079) （No.61262088,No.61462079）

电子学报

OA北大核心CSCDCSTPCD

ISSN：0372-2112

访问量3

下载量0

段落导航