大数据2024,Vol.10Issue(1):1-8,8.DOI:10.11959/j.issn.2096-0271.2024016
构建支持大模型训练的计算机系统需要考虑的4个问题
Four issues to consider in building a computer system supporting large model training
郑纬民1
作者信息
- 1. 清华大学计算机科学与技术系,北京 100084
- 折叠
摘要
Abstract
There are three types of computer systems that support large model training,among which the ecosystem based on domestic AI chip systems is not very good.To change this situation,it is necessary to develop 10 key software such as AI compilers and parallel acceleration.Moreover,systems based on supercomputers require good software and hardware collaborative design to better serve large model training.This article proposes a 4-point balanced design for building the infrastructure of a large model to ensure system performance,reliability,and scalability.关键词
大模型训练/计算机系统/超算系统/大模型基础设施Key words
large model training/computer system/supercomputing system/large model infrastructure分类
信息技术与安全科学引用本文复制引用
郑纬民..构建支持大模型训练的计算机系统需要考虑的4个问题[J].大数据,2024,10(1):1-8,8.