| 注册
首页|期刊导航|计算机工程与应用|服务器内存可靠性技术研究综述

服务器内存可靠性技术研究综述

李道童 李盛新 王兵 姚藩益 芦飞 艾山彬 张炳会 孙秀强 王若琳

计算机工程与应用2025,Vol.61Issue(15):72-92,21.
计算机工程与应用2025,Vol.61Issue(15):72-92,21.DOI:10.3778/j.issn.1002-8331.2409-0342

服务器内存可靠性技术研究综述

Review of Server Memory Reliability Technology

李道童 1李盛新 1王兵 1姚藩益 1芦飞 1艾山彬 1张炳会 1孙秀强 1王若琳2

作者信息

  • 1. 浪潮电子信息产业股份有限公司 硬件研发一部,济南 250000
  • 2. 菏泽学院 计算机学院,山东 菏泽 274000
  • 折叠

摘要

Abstract

Memory,as a core component in servers,has experienced continuous iterations in technology and significant improvements in performance,yet its reliability issues have emerged as a critical factor that cannot be overlooked in influ-encing the overall stability of servers.This paper reviews the evolution of memory technology,its structural characteris-tics,and the direct impact of its development on server performance.It delves into the diversity and deep complexity of memory failure modes.Furthermore,the article comprehensively explores the latest technological advancements in fault detection and handling,with particular emphasis on the pivotal role of error correction codes(ECC)and RAS(reliability,availability,serviceability)technologies for memory.It also focuses on the forefront exploration of memory risk cell pre-diction technologies,particularly memory failure prediction methods that integrate deterministic rules or machine learning algorithms.Based on this foundation,the paper conducts a systematic analysis of the core challenges facing the current memory reliability field and offers a forward-looking outlook on future research directions,encompassing precise prediction of memory aging,real-time monitoring of health status,and the profound application of machine learning in predictive analysis.Ultimately,the paper underscores the necessity of concurrently enhancing the stability and reliability of server memory while pursuing its ultimate performance,to accommodate the ever-growing demands on server performance.This provides invaluable practical guidance and theoretical references for the future development of memory RAS technologies.

关键词

服务器内存/故障容错/故障检测/健康监测/机器学习

Key words

server memory/fault tolerance/fault detection/health monitoring/machine learning

分类

信息技术与安全科学

引用本文复制引用

李道童,李盛新,王兵,姚藩益,芦飞,艾山彬,张炳会,孙秀强,王若琳..服务器内存可靠性技术研究综述[J].计算机工程与应用,2025,61(15):72-92,21.

基金项目

山东省自然科学基金(ZR2019LZH006). (ZR2019LZH006)

计算机工程与应用

OA北大核心

1002-8331

访问量0
|
下载量0
段落导航相关论文