| 注册
首页|期刊导航|计算机工程与科学|基于GPU共享的深度学习训练任务加速调度框架

基于GPU共享的深度学习训练任务加速调度框架

林辰汐 李嘉伦 莫萱 周杰英 吴维刚

计算机工程与科学2026,Vol.48Issue(3):389-397,9.
计算机工程与科学2026,Vol.48Issue(3):389-397,9.DOI:10.3969/j.issn.1007-130X.2026.03.002

基于GPU共享的深度学习训练任务加速调度框架

A GPU-sharing-based scheduling framework for accelerating deep learning training tasks

林辰汐 1李嘉伦 2莫萱 1周杰英 1吴维刚1

作者信息

  • 1. 中山大学计算机学院,广东 广州 510006
  • 2. 广东技术师范大学计算机科学学院,广东 广州 510665
  • 折叠

摘要

Abstract

Deep learning(DL)is increasingly being applied across a wide range of business scenarios.How to efficiently utilize resources in GPU clusters for training DL tasks and reduce task completion times has garnered sustained attention from both industry and academia.A single DL training task often fails to fully leverage all the computational resources of a GPU,and the exclusive GPU allocation by traditional schedulers leads to low resource utilization.This paper proposes a GPU-sharing-based task scheduling framework,G-Share,which allows multiple DL tasks to be trained on the same GPU simul-taneously,enabling co-location scheduling.Task scheduling and resource allocation are performed while being aware of the interference between co-located tasks,aiming to enhance GPU utilization and thereby accelerate task execution.Specifically,it first characterizes the mutual interference information between tasks through offline modeling and online updates,and models the GPU-sharing-based scheduling prob-lem as a weighted bipartite graph minimum matching problem.By solving this problem,resource allocation results are obtained,and a dynamic task scheduling mechanism combined with time-slicing is employed to perceive changes in the optimal co-location combinations of tasks in online scenarios.Ex-periments conducted on the DL task workload data from SenseTime demonstrates that G-Share achieves a 20.6%reduction in the average task completion times compared to benchmark methods.

关键词

云计算/深度学习/资源调度/GPU共享/任务间干扰

Key words

cloud computing/deep learning/resource scheduling/GPU sharing/task interference

分类

信息技术与安全科学

引用本文复制引用

林辰汐,李嘉伦,莫萱,周杰英,吴维刚..基于GPU共享的深度学习训练任务加速调度框架[J].计算机工程与科学,2026,48(3):389-397,9.

基金项目

广东省自然科学基金(2025A1515011663) (2025A1515011663)

计算机工程与科学

1007-130X

访问量0
|
下载量0
段落导航相关论文