首页|期刊导航|计算机工程|Kubeflow异构算力调度策略研究

Kubeflow异构算力调度策略研究

孙毅王会梅鲜明向航

计算机工程2024，Vol.50Issue(2)：25-32,8.

计算机工程2024，Vol.50Issue(2)：25-32,8.DOI:10.19678/j.issn.1000-3428.0067396

Kubeflow异构算力调度策略研究

Research on Heterogeneous Computing Scheduling Strategy for Kubeflow

孙毅 ¹王会梅 ¹鲜明 ¹向航¹

作者信息

1. 国防科技大学电子科学学院,湖南长沙 410000
折叠

摘要

Abstract

Kubeflow is a project that integrates machine learning and cloud computing technology,integrating a large number of machine learning tools and providing a feasible solution for the deployment of production-grade machine learning platforms.Machine learning relies on specialized Graphics Processing Unit(GPU)s to improve training and inference speed.As the size of cloud computing clusters is dynamically adjusted,computing nodes of different computing architectures can be added or removed from the cluster,and traditional round-robin scheduling strategies cannot realize the dynamic adjustment of heterogeneous computing power resources.To solve the allocation and optimization problems of Kubeflow's heterogeneous computing power,improve the utilization rate of platform resources,and achieve load balancing,a cloud-based Central Processing Unit-GPU(CPU-GPU)heterogeneous computing power scheduling strategy is proposed.This scheduling strategy adopts two judgment indicators:weighted load balancing degree and priority,and fine-grained allocation of display memory to achieve granularity of computing power resources.The optimal deployment scheme of Pod is designed according to the resource weight matrix of each node in the cluster,and an improved genetic algorithm is used for optimal deployment.The experimental results show that this scheduling strategy performs better for parallel tasks.It can execute optimal loads under overflow of resource requests.Compared with the original platform-native strategy,the degree of resource fine-tuning is one order of magnitude higher,and the cluster load balancing performance is also significantly improved.

关键词

云计算/机器学习/异构算力/资源调度/遗传算法

Key words

cloud computing/machine learning/heterogeneous computing/resource scheduling/genetic algorithm

分类

信息技术与安全科学

引用本文复制引用

孙毅,王会梅,鲜明,向航..Kubeflow异构算力调度策略研究[J].计算机工程,2024,50(2):25-32,8.

基金项目

国家部委基金. （）

计算机工程

OA北大核心CSTPCD

ISSN：1000-3428

访问量0

下载量0

段落导航