| 注册

DeepSeek-R1是怎样炼成的?

张慧敏

深圳大学学报(理工版)2025,Vol.42Issue(2):226-232,7.
深圳大学学报(理工版)2025,Vol.42Issue(2):226-232,7.DOI:10.3724/SP.J.1249.2025.02226

DeepSeek-R1是怎样炼成的?

How DeepSeek-R1 was created?

张慧敏1

作者信息

  • 折叠

摘要

Abstract

This article summarizes the innovations and optimizations in DeepSeek series models for large-scale training.The breakthroughs of DeepSeek are primarily reflected in model and algorithm innovations,software and hardware collaborative optimization,and the improvement of overall training efficiency.The DeepSeek-V3 adopts a mixture of experts(MoE)architecture,achieving efficient utilization of computing resources through fine-grained design and shared expert strategies.The sparse activation mechanism and lossless load balancing strategy in the MoE architecture significantly enhance the efficiency and performance of model training,especially when handling large-scale data and complex tasks.The innovative multi-head latent attention(MLA)mechanism reduces memory usage and accelerates the inference process,thus lowering training and inference costs.In DeepSeek-V3's training,the introduction of multi-token prediction(MTP)and 8-bit floating-point(FP8)mixed-precision training technologies improves the model's contextual understanding and training efficiency,while optimizing parallel thread execution(PTX)code significantly enhances the computation efficiency of graphics processing units(GPUs).In training the DeepSeek-R1-Zero model,group relative policy optimization(GRPO)is used for pure reinforcement learning,by passing the traditional supervised fine-tuning and human feedback stages,leading to a significant improvement in inference capabilities.Overall,DeepSeek series models has achieved significant advantages in the field of artificial intelligence through multiple innovations,setting a new industry benchmark.

关键词

人工智能/DeepSeek/大语言模型/混合专家模型/多头潜在注意力机制/多token预测/混合精度训练/群体相对策略优化

Key words

artificial intelligence/DeepSeek/large language model/mixture of experts architecture/multi-head latent attention mechanism/multi-token prediction/mixed-precision training/group relative policy optimization

分类

信息技术与安全科学

引用本文复制引用

张慧敏..DeepSeek-R1是怎样炼成的?[J].深圳大学学报(理工版),2025,42(2):226-232,7.

深圳大学学报(理工版)

OA北大核心

1000-2618

访问量0
|
下载量0
段落导航相关论文