|国家科技期刊平台
首页|期刊导航|中国内镜杂志|基于自动化机器学习建立结肠镜肠道准备失败风险预测模型及评价

基于自动化机器学习建立结肠镜肠道准备失败风险预测模型及评价OACSTPCD

Establishing and evaluating a risk prediction model for colonoscopy bowel preparation failure based on automated machine learning

中文摘要英文摘要

目的 鉴于机器学习(ML)在医学模型中的广泛应用,以及其出色的学习和泛化特性,该研究采用自动化机器学习(AutoML)结合患者一般资料和临床状况,早期评估结肠镜前肠道准备的失败风险.方法 回顾性分析2022年1月-2023年1月在该院接受结肠镜检查的患者的临床资料.波士顿肠道准备评分(BBPS)≤5分被定义为肠道准备失败,>5分为合格.将患者按8∶2的比例随机划分了训练集(n=303)和验证集(n=76).采用最小绝对收缩和选择算子(LASSO)逻辑回归(LR)模型进行特征选择,构建列线图评分系统,并使用基于5种算法的AutoML建立模型.模型性能通过受试者操作特征曲线(ROC curve)、校准曲线、基于LR(Lasso回归)的决策曲线分析(DCA)、SHAP图和力图进行评估.结果 在379例患者中,105例(27.7%)肠道准备失败(BBPS≤5分).21个研究变量在经LASSO 5折交叉验证后,获得10个变量,并构建了一款列线图评分系统,通过校正曲线表明了LASSO模型的可靠性.使用H2O平台和5种算法[梯度提升机(GBM)、深度学习(DL)、广义线性模型(GLM)、堆叠集成(Stacked Ensemble)和分布式随机森林(DRF)]开发了67个模型.经比较,Stacked Ensemble表现最佳,其曲线下面积(AUC)为0.871,对数损失值(LogLoss)为0.403,均方根误差(RMSE)为0.354,优于其他模型和传统的LR模型.变量重要性贡献图显示,服完泻药至检查间隔时间、便秘、是否完整服完泻药、年龄和家属陪同等因素对肠道准备失败的预测有重要影响.最后,SHAP图和力图揭示了变量在二分类预测结果中的分布特征,以及各变量对预测结果的影响.结论 基于Stacked Ensemble算法的AutoML模型,对肠道准备失败风险的早期预测有明显的临床实用性.同时,该研究构建了一款可供临床使用的列线图评分工具.

Objective Given the extensive application of machine learning(ML)in medical models and its remarkable learning and generalization capabilities,this study employed automated ML(AutoML)combined with patient demographics and clinical conditions to early assess the risk of failure in bowel preparation prior to colonoscopy.Methods A retrospective analysis was conducted on patients who underwent colonoscopy examinations in Hospital 1 and Hospital 2 from January 2022 to January 2023,and their general and clinical information was collected.According to the Boston bowel preparation scale(BBPS),a BBPS of≤5 was defined as a failure in bowel preparation,>5 was deemed satisfactory.From the data of the two hospitals,we randomly divided the dataset into a training set(n=303)and a validation set(n=76)at an 8∶2 ratio.Least absolute shrinkage and selection operator(LASSO)logistic regression(LR)model was used for feature selection,a nomogram scoring system was constructed,and models were established using AutoML based on five algorithms.Model performance was evaluated through receiver operator characteristic curve(ROC curve),calibration curves,LR-based decision curve analysis(DCA),SHAP plots,and force plots.Results Among the 379 patients,105 cases(27.7%)experienced bowel preparation failure(BBPS≤5).21 study variables were narrowed down to 10 through LASSO with 5-fold cross-validation,resulting in the development of a Nomogram chart with demonstrated reliability via calibration curves.Using the H2O platform and five algorithms[gradient boosting machine(GBM),deep learning(DL),generalized linear model(GLM),Stacked Ensemble and distributed random forest(DRF)],67 models were developed.Stacked Ensemble outperformed the others with an area under the curve(AUC)of 0.871,LogLoss of 0.403,and RMSE of 0.354,surpassing traditional LR model and other models.Variable importance contribution plots indicated significant predictive influences from factors such as the interval between laxative ingestion and examination,history of constipation,completion of laxative regimen,age,and presence of a companion during the procedure.Finally,SHAP plots and force plots revealed variable distribution patterns in binary classification predictions and the impact of variables on predictive outcomes.Conclusion The AutoML model based on the Stacked Ensemble algorithm exhibits clear clinical utility in early prediction of bowel preparation failure risk.Moreover,a clinically applicable column chart scoring tool is constructed.

王甘红;陈健;沈支佳;奚美娟;周燕婷

常熟市中医院(新区医院)消化内科,江苏 常熟 215500苏州大学附属常熟医院(常熟市第一人民医院)消化内科,江苏 常熟 215500

临床医学

波士顿肠道准备评分(BBPS)结肠镜自动化机器学习(AutoML)预测模型列线图

Boston bowel preparation scale(BBPS)colonoscopyautomated machine learning(AutoML)predictive modelnomogram

《中国内镜杂志》 2024 (005)

36-47 / 12

常熟市卫生健康委员会科技计划项目(No:CSWS202316)

10.12235/E20230422

评论