同济大学学报(医学版)2025,Vol.46Issue(6):848-856,9.DOI:10.12289/j.issn.2097-4345.25390
整合淋巴细胞亚群与临床特征的机器学习模型在非结核分枝杆菌肺病、肺结核及其他肺部疾病鉴别诊断中的应用与效能评估
Application and performance of machine learning models integrating lymphocyte subsets and clinical features in:discriminating NTM-PD,pulmonary tuberculosis and other lung diseases
摘要
Abstract
Objective Based on lymphocyte subset count indicators,diagnostic models were constructed using different machine learning methods to distinguish non-tuberculous mycobacterial pulmonary disease(NTM-PD),pulmonary tuberculosis(PTB),and other common confounding pulmonary diseases,to provide a scientific basis for the early identification of infectious pulmonary diseases.Methods The patients diagnosed with active tuberculosis(ATB),NTM-PD,or other pulmonary diseases(including inflammatory and neoplastic conditions)admitted to the Department of Tuberculosis at Shanghai Pulmonary Hospital from January to December in 2023 were included in this study.Lymphocyte subset counts were measured using flow cytometry.Four machine learning algorithms—multinomial logistic regression,naive Bayes,random forest,and XGBoost—were employed for model development.Hyperparameter tuning was performed using Bayesian optimization and cross-validation.The variables with P<0.1 from univariate analysis were selected and further refined via correlation analysis and LASSO for final model input.The models were evaluated using area under the receiver operating characteristic curve(AU-ROC),average precision-precision recall curve(AP-PR),and decision curve analysis(DCA)curves on the test set.Results A total of 1 383 patients were included,with 836 cases in the ATB group,254 in the NTM group,and 293 in the OTHER group.Using selected demographic data,comorbidities,and lymphocyte subset indices as input variables and disease category as the outcome variable,four machine learning models were successfully constructed.Among them,the random forest model demonstrated the best predictive performance;the top contributing variables in the models were body mass index(BMI),CD3+T cells,CD16+56+NK cells,CD8+T cells(cytotoxic T cells),age,%CD3+T cells,CD19+B cells,CD4+T cells(helper T cells),gender,anemia,diabetes,leukopenia,hypoproteinemia,and autoimmune disease;and BMI,CD3+T cells,CD16+56+NK cells,and CD+T cells(cytotoxic T cells)contributed most significantly.Conclusion The machine learning models developed in this study successfully differentiated ATB,NTM-PD,and other pulmonary diseases by integrating lymphocyte subset profiles with clinical features.These models provide novel approaches for the early diagnosis and personalized management of pulmonary diseases.关键词
活动性肺结核/淋巴细胞亚群/机器学习Key words
active pulmonary tuberculosis/lymphocyte subsets/machine learning分类
医药卫生引用本文复制引用
王蕾,曹婕,刘轾彬,程丽平,吴小翠,孙勤,沙巍..整合淋巴细胞亚群与临床特征的机器学习模型在非结核分枝杆菌肺病、肺结核及其他肺部疾病鉴别诊断中的应用与效能评估[J].同济大学学报(医学版),2025,46(6):848-856,9.基金项目
上海市卫生健康委员会青年项目(20204Y0325) (20204Y0325)