| 注册
首页|期刊导航|大数据|深度学习模型训练过程检查点访问性能优化方法

深度学习模型训练过程检查点访问性能优化方法

滕云 张广艳 孙大为 田海东 常锐

大数据2026,Vol.12Issue(2):75-84,10.
大数据2026,Vol.12Issue(2):75-84,10.DOI:10.11959/j.issn.2096-0271.2026029

深度学习模型训练过程检查点访问性能优化方法

Checkpoint accessing performance optimization method for the deep learning model training process

滕云 1张广艳 2孙大为 1田海东 3常锐3

作者信息

  • 1. 中国地质大学(北京)人工智能学院,北京 100083
  • 2. 清华大学计算机科学与技术系,北京 100084
  • 3. 中兴通讯股份有限公司,江苏 南京 210012
  • 折叠

摘要

Abstract

As LLMs become more widely used and their scale continues to expand,currently LLMs training faces issues such as high error rate and poor performance of checkpoint accessing.This paper reviews the strengths and weaknesses of existing methods for optimizing checkpoint accessing performance and introduces a novel method for optimizing checkpoint accessing performance.Based on the observation of data patterns in checkpoints,where the model weights change between adjacent checkpoints are minimal,making them suitable for delta compression.The proposed method implements delta compression across multiple interconnected training nodes and conducts experimental tests using real checkpoints generated during deep learning model training.The results demonstrate that,during the model training,delta compression has good compression effect for most checkpoints.Furthermore,the paper introduces dynamic intervals in delta compression to balance compression ratio and storage overhead,while also analyzing the characteristics of momentum datas.The analysis of existing methods and the optimization of checkpoint accessing performance offer insights for accelerating LLMs training.

关键词

大模型/检查点/数据压缩/性能提升

Key words

LLM/checkpoint/data compression/performance improvement

分类

信息技术与安全科学

引用本文复制引用

滕云,张广艳,孙大为,田海东,常锐..深度学习模型训练过程检查点访问性能优化方法[J].大数据,2026,12(2):75-84,10.

基金项目

国家自然科学基金项目(No.62025203) The National Natural Science Foundation of China(No.62025203) (No.62025203)

大数据

2096-0271

访问量0
|
下载量0
段落导航相关论文