肿瘤预防与治疗2026,Vol.39Issue(5):349-358,10.DOI:10.3969/j.issn.1674-0904.2026.05.002
多机器学习模型在慢性肝病和肝癌预测中的效能比较及最优模型筛选
Comparison of the Efficacy of Multiple Machine Learning Models for Pre-diction of Chronic Liver Disease and Hepatocellular Carcinoma and Screening of the Optimal Model
摘要
Abstract
Objective:The diagnostic methods for chronic liver disease and hepatocellular carcinoma based on serological assays,imaging tests and liver biopsies are characterized by low sensitivity,invasiveness and susceptibility to subjective fac-tors.Thus,there is an urgent need to develop a non-invasive,accurate,and highly sensitive detection method for chronic liver disease and hepatocellular carcinoma.The purpose of this study is to explore the optimal machine learning-based model for non-invasive prediction of chronic liver disease and hepatocellular carcinoma.Methods:A total of 24,215 patients with chronic liver disease and 50 feature variables were en-rolled from the First Affiliated Hospital of Xinjiang Medical University between 2013 and 2023.Four methods,including au-toencoder,matrix completion,cluster internal mean imputation and missForest,were adopted for missing data imputation.Subsequently,36 key features screened by univariate statistical test,random forest,recursive feature elimination and correla-tion analysis were taken as the input of three models:random forest,logistic regression and XGBoost.Model performance was evaluated by accuracy,recall,ROC curves and AUC values.Cross-validation was used to assess the generalization abili-ty of the models.Finally,SHapley Additive exPlanation(SHAP)was applied to interpret and analyze the important fea-tures.Results:The combination of matrix completion imputation method and XGBoost model achieved the optimal perform-ance,with an accuracy of 0.769.Specifically,the AUC values for fatty liver,hepatitis,liver cirrhosis,and liver cancer were 0.93,0.87,0.94,and 0.92,respectively.Furthermore,SHAP analysis revealed that platelet count,serum cholines-terase,and quantitative hepatitis B surface antigen were among the key features for predicting chronic liver disease.Conclu-sion:The XGBoost machine learning model based on the matrix completion imputation method is the optimal model for pre-dicting chronic liver disease.By enhancing the model's interpretability with SHAP,the model can identify critical features related to chronic liver disease,providing reference for early diagnosis.关键词
慢性肝病/肝癌/XGBoost/模型预测/SHAP分析Key words
Chronic liver disease/Hepatocellular carcinoma/XGBoost/Model prediction/SHAP analysis分类
医药卫生引用本文复制引用
张韬,买热比娅·马合木提,高颖..多机器学习模型在慢性肝病和肝癌预测中的效能比较及最优模型筛选[J].肿瘤预防与治疗,2026,39(5):349-358,10.基金项目
This study was supported by grants from Science and Technology Committee of Xinjiang Uygur Autonomous Region(No.2022E02115). 新疆维吾尔自治区区域协同创新专项-科技援疆计划(编号:2022E02115) (No.2022E02115)