高技术通讯2025,Vol.35Issue(10):1037-1050,14.DOI:10.3772/j.issn.1002-0470.2025.10.001
基于重用的作业合并执行优化技术
Reuse-based job merging execution optimization techniques
摘要
Abstract
With the rise of cloud computing and big data analytics,large-scale job services running in distributed clusters often exhibit significant job overlapping.Effectively identifying and reusing computation overlaps is crucial to miti-gate job delays and memory overhead caused by data and computation reuse in large-scale data processing applica-tions.To address this challenge,this paper proposes a reuse-based job merge execution system named MergeLap.MergeLap employs a job structure signature mechanism and a cost model-based common substructure selection strat-egy to efficiently identify and search for extremely maximal common substructures.By utilizing a chain cache struc-ture for substructure caching,intermediate results can be stored for fast indexing while reducing memory consump-tion.Experimental results demonstrate that the proposed approach effectively reduces job execution time and im-proves memory usage efficiency.Compared with native SparkSQL,MergeLap reduces the running time of batch jobs across multiple workloads by up to 46.5%,and decreases cache usage by up to 60.7%.关键词
数据重用/计算重用/公共子结构/作业合并/成本模型Key words
data reuse/computation reuse/common substructure/job merging/cost model引用本文复制引用
张进东,谭光明..基于重用的作业合并执行优化技术[J].高技术通讯,2025,35(10):1037-1050,14.基金项目
国家自然科学基金(62032023)资助项目. (62032023)