| 注册
首页|期刊导航|计算机应用研究|基于申威NMII的锁死故障监测与诊断

基于申威NMII的锁死故障监测与诊断

郜晨 何升 杭骁骞

计算机应用研究2024,Vol.41Issue(4):1015-1021,7.
计算机应用研究2024,Vol.41Issue(4):1015-1021,7.DOI:10.19734/j.issn.1001-3695.2023.08.0395

基于申威NMII的锁死故障监测与诊断

Lockup fault monitoring and diagnosis based on Sunway NMII

郜晨 1何升 1杭骁骞1

作者信息

  • 1. 无锡先进技术研究院,江苏无锡 214026
  • 折叠

摘要

Abstract

The non maskable inter-processor interrupt(NMII)of the domestic Sunway processor must be initiated by one of the other cores.Therefore,it is difficult to apply the general lockup fault monitoring algorithm of Linux.In severe cases,it will jeopardize the data processing in critical areas.This paper designed a lockup fault monitoring and diagnosis system for Sunway architecture to solve the above problem.It used a chain structure to send NMII requests,and combined timer event and kernel thread to check the lockup time stamp,realized the soft lockup and hard lockup monitoring of single core in the system.Based on the fault tolerance mechanism,it adopted a master-slave structure to monitor the state of all cores.When the master core failed,the system implemented fault tolerance measures and migrated the master core,realized the multi-core lockup monitoring in the system.It designed a task model based on NMII,and realized the diagnostic information output of the fault cores,exten-ded the application scenarios of NMII.The test results show that the proposed algorithm can accurately detect the lockup fault and make real-time diagnosis under both low and high fault risk,and meet the reliability and real-time requirements of lockup fault monitoring and diagnosis of Sunway platform.

关键词

申威处理器/不可屏蔽中断/操作系统/锁死/故障诊断/看门狗

Key words

Sunway processor/non maskable interrupt(NMI)/operating system/lockup/fault diagnosis/Watchdog

分类

信息技术与安全科学

引用本文复制引用

郜晨,何升,杭骁骞..基于申威NMII的锁死故障监测与诊断[J].计算机应用研究,2024,41(4):1015-1021,7.

基金项目

科技部重点支持项目(GG20210701) (GG20210701)

计算机应用研究

OA北大核心CSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文