| 注册
首页|期刊导航|高技术通讯|一种基于日志结合分析的集群系统失效预测方法

一种基于日志结合分析的集群系统失效预测方法

付晓毓 任睿 詹剑锋 孙凝晖

高技术通讯2016,Vol.26Issue(6):519-527,9.
高技术通讯2016,Vol.26Issue(6):519-527,9.DOI:10.3772/j.issn.1002-0470.2016.06.001

一种基于日志结合分析的集群系统失效预测方法

A log co-analysis based failure prediction method for large-scale cluster systems

付晓毓 1任睿 2詹剑锋 3孙凝晖1

作者信息

  • 1. 计算机体系结构国家重点实验室 北京100190
  • 2. 中国科学院计算技术研究所 北京100190
  • 3. 中国科学院研究生院 北京100049
  • 折叠

摘要

Abstract

The failure prediction for large-scale cluster supercomputer was studied.Aiming at the prolem that the existing prediction method only analyzing the single system log needs complex data mining techniques while its prediction re-call rate is generally lower, this study presented an effective failure prediction method based on co-analysis of sys-tem logs and job logs that records the running workload information.The principle of the method is below:Firstly, the fine-grained two-dimensional event sequence and job sequence are produced through preprocessing and filtering of the two raw logs;Secondly, three failure symptoms are extracted from job logs before the occurrence of failure events;Finally, failure predictions are carried out by using these symptoms.The results of the experiments on real logs of the BlueGene/P system show that the proposed method can predict failures with a higher precision and a higher recall rate.

关键词

大规模集群系统/系统日志/作业日志/日志分析/失效预测

Key words

large-scale cluster system/system log/job log/log analysis/failure prediction

引用本文复制引用

付晓毓,任睿,詹剑锋,孙凝晖..一种基于日志结合分析的集群系统失效预测方法[J].高技术通讯,2016,26(6):519-527,9.

基金项目

863计划(2015AA015308)和973计划(2014CB340402)资助项目. (2015AA015308)

高技术通讯

OA北大核心CSTPCD

1002-0470

访问量0
|
下载量0
段落导航相关论文