现代医药卫生2025,Vol.41Issue(10):2353-2357,2361,6.DOI:10.3969/j.issn.1009-5519.2025.10.018
基于机器学习构建妊娠地中海贫血智能诊断的算法模型
Machine learning-based algorithmic model for intelligent diagnosis of gestational thalassemia
摘要
Abstract
Objective To construct a machine learning-based intelligent diagnostic model for gestational thalassemia and optimize screening strategies.Methods A retrospective cohort study was designed,collecting data from 4 715 pregnant women at People's Hospital of Chongqing Liangjiang New Area from January 2018 to December 2020,including 338 α-thalassemia cases(7.17%),286 β-thalassemia cases(6.07%),and 4 091 normal controls(86.76%).Data comprised complete blood count,blood type,and genetic test results.Key features were selected using LASSO regression,followed by stratified random sampling to split the dataset in-to training(n=3 772)and test sets(n=943)at an 8︰2 ratio.To address the issue of category imbalance,by combining SMOTE oversampling with cost-sensitive learning strategies,a thalassemia diagnosis model was constructed using six key indicators.The diagnostic efficiencies of seven classic machine learning methods,namely Extreme Gradient Boost(XGBoost),Decision Tree(DT),K-Nearest Neighbor(KNN),Linear Dis-criminant Analysis(LDA),Naive Bayes(NB),Random Forest(RF),and Support Vector Machine(SVM),were systematically compared.The area under the receiver operator characteristic(ROC)curve(AUC),F1-score,sensitivity and specificity were used for model evaluation.Results Feature selection identified red blood cells(β=-0.21),hemoglobin(β=0.28),hematocrit(β=-0.62),platelets(β=-0.48),mean platelet vol-ume(β=0.36),and platelet hematocrit(β=0.12)as key predictors.The AUCs of the seven machine learning algorithms were all greater than 0.88,but considering the comprehensive judgment of sensitivity,specificity,positive predictive value,negative predictive value and Youden index,the XGBoost model performed best and had the highest indicators.The RF model was second.The Youden indexes of the other five models were all less than 0.7.The XGBoost model performed best,with an AUC of 0.980(95%confidence interval 0.967-0.993),an F1-score of 0.938,a sensitivity of 89.3%,and a specificity of 94.0%,which were significantly bet-ter than traditional screening indicators(McNemar test P<0.05).Conclusion The XGBoost diagnostic mod-el based on the six parameters of complete blood count has good clinical applicability,and combines synthetic sampling and cost-sensitive learning strategies to effectively solve the problem of data imbalance.This model provides a high-precision and low-cost solution for prenatal screening of thalassemia.关键词
机器学习/妊娠/地中海贫血/诊断/算法模型Key words
Machine learning/Pregnancy/Thalassemia/Diagnosis/Algorithmic model分类
医药卫生引用本文复制引用
张琴,肖爽,赵庆华..基于机器学习构建妊娠地中海贫血智能诊断的算法模型[J].现代医药卫生,2025,41(10):2353-2357,2361,6.基金项目
重庆市科卫联合医学科研项目(2022MSXM145、2023MSXM075). (2022MSXM145、2023MSXM075)