实用医学杂志2026,Vol.42Issue(7):1158-1164,7.DOI:10.3969/j.issn.1006-5725.2026.07.006
运用机器学习构建肺癌与肺结核鉴别诊断模型
Constructing a differential diagnosis model for lung cancer and pulmonary tuberculosis using machine learning
摘要
Abstract
Objective To develop a predictive model for differentiating lung cancer and pulmonary tuberculosis,machine learning methods were employed.Methods A retrospective analysis was conducted on the clinical data of 585 patients who visited Guangxi Chest Hospital from July 2020 to May 2023.The patients' ages ranged from 14 to 90 years old,with 457 males and 128 females.Based on the final clinical diagnosis results,the 585 cases were divided into the lung cancer group and the pulmonary tuberculosis group.The differences in tumor marker test results between the two groups of cases were compared.Lasso and single-factor logistic regression analysis were used to screen feature variables for differentiating lung cancer from pulmonary tuberculosis.A random forest model was constructed,and the important predictive variables were ranked.A Lasso-logistic regression model was constructed.The predictive efficacy of the random forest model and the Lasso-logistic regression model was compared through ROC curve analysis.Results The levels of serum tumor markers CA125,CEA,CYFRA21-1,NSE,and SCCA in the lung cancer group were significantly higher than those in the pulmonary tuberculosis group,showing statistically significant differences(P<0.05).Lasso and single-factor logistic regression analysis was conducted to identify the following characteristic variables for differentiating lung cancer from pulmonary tuberculosis:sex,age,CA125,CEA,CYFRA21-1,NSE,and SCCA.A random forest model was used to rank these variables by importance as follows:CYFRA21-1,CEA,SCCA,NSE,CA125,age,and sex.The results of Lasso-logistic regression analysis indicated that the levels of CYFRA21-1,CEA,NSE,and age were independent risk factors for differentiating lung cancer from pulmonary tuberculosis(P<0.05).The AUC,sensitivity,specificity,accuracy,and Youden index of the Random Forest model and the Lasso-logistic regression model for the differential diagnosis of lung cancer and pulmonary tuberculosis were 0.938,90.38%,87.50%,0.888,0.779 and 0.958,86.54%,92.19%,0.879,0.787,respectively.Conclusions The tumor markers CA125,CEA,CYFRA21-1,NSE,and SCCA hold significant clinical value in the differential diagnosis of lung cancer and pulmonary tuberculosis.The random forest model and Lasso-logistic regression model developed in this study can effectively discriminate between lung cancer and pulmonary tuberculosis.The Lasso-logistic regression model identified that the levels of CYFRA21-1,CEA,NSE,and age were independent risk for differentiating lung cancer from pulmonary tuberculosis.关键词
机器学习/随机森林/鉴别诊断/肺癌/肺结核Key words
machine learning/random forest/differential diagnosis/lung cancer/pulmonary tu-berculosis分类
医药卫生引用本文复制引用
周游,陈纪飞,何希,刘爱梅,杨小兵,黄一芳..运用机器学习构建肺癌与肺结核鉴别诊断模型[J].实用医学杂志,2026,42(7):1158-1164,7.基金项目
广西科技重大专项(编号:桂科AA22096027) (编号:桂科AA22096027)
广西医疗卫生适宜技术开发与推广应用项目(编号:S2023050) (编号:S2023050)