深圳大学学报(理工版)2025,Vol.42Issue(3):317-325,9.DOI:10.3724/SP.J.1249.2025.03317
并发式Spark消息分发器
Concurrent Spark message distributor
摘要
Abstract
In the Spark big data computing framework,the driver employs an iterative message distributor mechanism that incurs considerable task submission overhead,delays task initiation,restricts execution concurrency,and causes idle waiting among executors—ultimately leading to inefficient utilization of computing resources.To address these issues,we propose an efficient and lightweight concurrent Spark message distributor based on a thread pool scheduling strategy.In contrast to Spark's original distributor,the proposed design is better suitable for scheduling fine-grained,high overhead tasks.It parses metadata containing key executor information to extract the task list and corresponding executor identifiers to each task,then initializes a thread pool to launch asynchronous computations for each task,thereby enabling true concurrent task distribution.This approach significantly reduces dispatch latency while ensuring system stability and reliable task execution.Experimental evaluations conducted in a virtualized cluster environment demonstrate the superiority of the proposed distributor over the original Spark mechanism.Results show that,with memory usage held constant,the concurrent distributor reduces task execution time by about 9%and increases central processing unit utilization by about 5%.The proposed concurrent Spark message distributor,effectively mitigates the high overhead and computational resource inefficiency associated with traditional message distribution methods in fine-grained task scenarios.关键词
并行处理/大数据计算/Spark通信机制/消息分发/细粒度任务/线程池调度Key words
parallel processing/big data computing/Spark communication mechanism/message distribution/fine-grained tasks/thread pool scheduling分类
信息技术与安全科学引用本文复制引用
何玉林,林泽杰,徐毓阳,成英超,黄哲学..并发式Spark消息分发器[J].深圳大学学报(理工版),2025,42(3):317-325,9.基金项目
Natural Science Foundation of Guangdong Province(2023A1515011667) (2023A1515011667)
Science and Technology Major Project of Shenzhen(KJZD20230923114809020) (KJZD20230923114809020)
Basic Research Foundation of Shenzhen(JCYJ20210324093609026) (JCYJ20210324093609026)
Guangdong Basic and Applied Basic Research Foundation(2023B1515120020) 广东省自然科学基金资助项目(2023A1515011667) (2023B1515120020)
深圳市科技重大专项资助项目(KJZD20230923114809020) (KJZD20230923114809020)
深圳市基础研究资助项目(JCYJ20210324093609026) (JCYJ20210324093609026)
广东省基础与应用基础研究基金粤深联合基金重点资助项目(2023B1515120020) (2023B1515120020)