| 注册

并发式Spark消息分发器

何玉林 林泽杰 徐毓阳 成英超 黄哲学

深圳大学学报(理工版)2025,Vol.42Issue(3):317-325,9.
深圳大学学报(理工版)2025,Vol.42Issue(3):317-325,9.DOI:10.3724/SP.J.1249.2025.03317

并发式Spark消息分发器

Concurrent Spark message distributor

何玉林 1林泽杰 1徐毓阳 2成英超 2黄哲学1

作者信息

  • 1. 人工智能与数字经济广东省实验室(深圳),广东 深圳 518107||深圳大学计算机与软件学院,广东 深圳 518060
  • 2. 人工智能与数字经济广东省实验室(深圳),广东 深圳 518107
  • 折叠

摘要

Abstract

In the Spark big data computing framework,the driver employs an iterative message distributor mechanism that incurs considerable task submission overhead,delays task initiation,restricts execution concurrency,and causes idle waiting among executors—ultimately leading to inefficient utilization of computing resources.To address these issues,we propose an efficient and lightweight concurrent Spark message distributor based on a thread pool scheduling strategy.In contrast to Spark's original distributor,the proposed design is better suitable for scheduling fine-grained,high overhead tasks.It parses metadata containing key executor information to extract the task list and corresponding executor identifiers to each task,then initializes a thread pool to launch asynchronous computations for each task,thereby enabling true concurrent task distribution.This approach significantly reduces dispatch latency while ensuring system stability and reliable task execution.Experimental evaluations conducted in a virtualized cluster environment demonstrate the superiority of the proposed distributor over the original Spark mechanism.Results show that,with memory usage held constant,the concurrent distributor reduces task execution time by about 9%and increases central processing unit utilization by about 5%.The proposed concurrent Spark message distributor,effectively mitigates the high overhead and computational resource inefficiency associated with traditional message distribution methods in fine-grained task scenarios.

关键词

并行处理/大数据计算/Spark通信机制/消息分发/细粒度任务/线程池调度

Key words

parallel processing/big data computing/Spark communication mechanism/message distribution/fine-grained tasks/thread pool scheduling

分类

信息技术与安全科学

引用本文复制引用

何玉林,林泽杰,徐毓阳,成英超,黄哲学..并发式Spark消息分发器[J].深圳大学学报(理工版),2025,42(3):317-325,9.

基金项目

Natural Science Foundation of Guangdong Province(2023A1515011667) (2023A1515011667)

Science and Technology Major Project of Shenzhen(KJZD20230923114809020) (KJZD20230923114809020)

Basic Research Foundation of Shenzhen(JCYJ20210324093609026) (JCYJ20210324093609026)

Guangdong Basic and Applied Basic Research Foundation(2023B1515120020) 广东省自然科学基金资助项目(2023A1515011667) (2023B1515120020)

深圳市科技重大专项资助项目(KJZD20230923114809020) (KJZD20230923114809020)

深圳市基础研究资助项目(JCYJ20210324093609026) (JCYJ20210324093609026)

广东省基础与应用基础研究基金粤深联合基金重点资助项目(2023B1515120020) (2023B1515120020)

深圳大学学报(理工版)

OA北大核心

1000-2618

访问量0
|
下载量0
段落导航相关论文