| 注册
首页|期刊导航|环境与职业医学|LightGBM模型及模型可解释性方法在预测职业伤害严重程度中的探讨

LightGBM模型及模型可解释性方法在预测职业伤害严重程度中的探讨

莫有桦 张鹏 谷一硕 朱晓俊 樊晶光

环境与职业医学2025,Vol.42Issue(2):157-164,8.
环境与职业医学2025,Vol.42Issue(2):157-164,8.DOI:10.11836/JEOM24317

LightGBM模型及模型可解释性方法在预测职业伤害严重程度中的探讨

Exploration of predicting occupational injury severity based on LightGBM model and model in-terpretability method

莫有桦 1张鹏 2谷一硕 2朱晓俊 2樊晶光2

作者信息

  • 1. 国家卫生健康委职业安全卫生研究中心,国家卫生健康委粉尘危害工程防护重点实验室,北京 102308||广东药科大学公共卫生学院,广东 广州 510240
  • 2. 国家卫生健康委职业安全卫生研究中心,国家卫生健康委粉尘危害工程防护重点实验室,北京 102308
  • 折叠

摘要

Abstract

[Background]Light gradient boosting machine(LightGBM)has become a popular choice in pre-diction models due to its high efficiency and speed.However,the"black box"issues in machine learning models lead to poor model interpretability.At present,few studies have evaluated the severity of occupational injuries from the perspective of LightGBM model and model inter-pretability. [Objective]To evaluate the application value of LightGBM models and model interpretability methods in occupational injury prediction. [Methods]The Mine Safety and Health Administration(MSHA)occupational injury data set of mining industry workers from 1983 to 2022 was used.Injury severity(death/fatal occupational injury and permanent/partial disability)was used as the outcome variable,and the predictor variables included the month of occurrence,age,sex,time of accident,time since beginning of shift,accident time interval from shift start,total experience,total mining experience,experience at this mine,cause of injury,accident type,activity of injury,source of injury,body part of injury,work environment type,product category,and nature of injury.Feature sets were screened using least absolute shrinkage and selection operator(Lasso)regression.A LightGBM model was then employed to predict occupational injury,with area under curve(AUC)of the model serving as the primary evaluation metric;an AUC closer to 1 indicates better predictive performance of the model.The interpretability of the model was evaluated using Shapley additive explanations(SHAP). [Results]Through Lasso regression,7 key influencing factors were identified,including accident time interval from shift start,experience at this mine,cause of injury,accident type,body part of injury,nature of injury,and work environment type.A LightGBM model,constructed based on feature selection via Lasso regression,demonstrated good predictive performance with an AUC value of 0.9941(95%CI:0.9917,0.9966),accuracy of 0.9743,specificity of 0.9781,and sensitivity of 0.9640.The predicted probability of fatal occupational injuries showed high consistency with the actual probability of fatal occupational injuries.In the occupational injury prediction model,the impor-tance of each indicator was analyzed through its SHAP value,and it was found that the body part of injury and the nature of injury were the two main features that affected the prediction results of the model,and the impacts of other features were relatively small.The dis-tribution of SHAP values across body part of injury was broad,with significant impacts on the model's prediction of fatal risk,particularly for injuries to the head and neck,as well as multi-part injuries.The nature of injury also exerted influences on the model in different di-rections,with suffocation/drowning,crushing,and multi-part injuries having a greater impact on the risk of fatal occupational injuries. [Conclusion]LightGBM model is capable of efficiently processing large-scale data and providing high-precision prediction results.Research on model interpretability aids in more accurately exploring and analyzing various key risk factors for fatal occupational injuries among mining workers,and further reveals the complex interactions among these factors.This,in turn,enables better preventive intervention and protection measures,as well as optimal resource allocation for labor workers.

关键词

职业伤害/轻量级梯度提升机算法/预测模型/模型可解释性/Shapley加法解释

Key words

occupational injury/light gradient boosting machine/prediction model/model interpretability/Shapley additive explanations

分类

预防医学

引用本文复制引用

莫有桦,张鹏,谷一硕,朱晓俊,樊晶光..LightGBM模型及模型可解释性方法在预测职业伤害严重程度中的探讨[J].环境与职业医学,2025,42(2):157-164,8.

环境与职业医学

OA北大核心

2095-9982

访问量0
|
下载量0
段落导航相关论文