东南大学学报(自然科学版)2017,Vol.47Issue(2):231-235,5.DOI:10.3969/j.issn.1001-0505.2017.02.006
并行计算框架Spark的自动检查点策略
Automatic checkpoint strategy for parallel computing frame with Spark
摘要
Abstract
The existing Spark checkpoint mechanism required the programmer to choose the checkpoint according to the experience,thus it had a certain risk and randomness,resulting in large recovery overhead.To address this problem,the resilient distribution datasets (RDD) characteristics were analyzed,and the weight generated (WG) algorithm and checkpoint automatic selection (CAS) algorithm were put forward.First,in the WG algorithm,the directed acyclic graph (DAG) of the job was analyzed,and the lineage length and the operation complexity of RDD were obtained to compute the RDD weight.Secondly,in the CAS algorithm,the RDD with the maximum weight was selected for setting checkpoints asynchronously to fast recovery.The experimental results show that comparing with the original Spark,the execution time and the checkpoint size of different datasets are increased by the CAS algorithm,while the increasing extent of Wiki-Talk is more obvious.For the single node failure recovery,the datasets have smaller recovery overhead after setting checkpoint by using the CAS algorithm.Therefore,the strategy can efficiently decrease the recovery overhead of jobs with sacrificing the slight extra overhead.关键词
自动检查点/RDD权重/Spark/恢复时间Key words
automatic checkpoint/resilient distribution dataset (RDD) weight/Spark/recovery time分类
信息技术与安全科学引用本文复制引用
英昌甜,于炯,卞琛,鲁亮,钱育蓉..并行计算框架Spark的自动检查点策略[J].东南大学学报(自然科学版),2017,47(2):231-235,5.基金项目
国家自然科学基金资助项目(61462079,61262088,61562086,61363083,61562078)、新疆维吾尔自治区高校科研计划资助项目(XJEDU2016S106). (61462079,61262088,61562086,61363083,61562078)