| 注册
首页|期刊导航|实用医学杂志|运用机器学习构建肺癌与肺结核鉴别诊断模型

运用机器学习构建肺癌与肺结核鉴别诊断模型

周游 陈纪飞 何希 刘爱梅 杨小兵 黄一芳

实用医学杂志2026,Vol.42Issue(7):1158-1164,7.
实用医学杂志2026,Vol.42Issue(7):1158-1164,7.DOI:10.3969/j.issn.1006-5725.2026.07.006

运用机器学习构建肺癌与肺结核鉴别诊断模型

Constructing a differential diagnosis model for lung cancer and pulmonary tuberculosis using machine learning

周游 1陈纪飞 2何希 2刘爱梅 2杨小兵 2黄一芳3

作者信息

  • 1. 广西医科大学第一附属医院检验科,广西高校临床检验诊断学重点实验室(广西南宁 530021)||广西壮族自治区胸科医院科教科生物样本库(广西柳州 545005)
  • 2. 广西壮族自治区胸科医院科教科生物样本库(广西柳州 545005)
  • 3. 广西医科大学第一附属医院检验科,广西高校临床检验诊断学重点实验室(广西南宁 530021)
  • 折叠

摘要

Abstract

Objective To develop a predictive model for differentiating lung cancer and pulmonary tuberculosis,machine learning methods were employed.Methods A retrospective analysis was conducted on the clinical data of 585 patients who visited Guangxi Chest Hospital from July 2020 to May 2023.The patients' ages ranged from 14 to 90 years old,with 457 males and 128 females.Based on the final clinical diagnosis results,the 585 cases were divided into the lung cancer group and the pulmonary tuberculosis group.The differences in tumor marker test results between the two groups of cases were compared.Lasso and single-factor logistic regression analysis were used to screen feature variables for differentiating lung cancer from pulmonary tuberculosis.A random forest model was constructed,and the important predictive variables were ranked.A Lasso-logistic regression model was constructed.The predictive efficacy of the random forest model and the Lasso-logistic regression model was compared through ROC curve analysis.Results The levels of serum tumor markers CA125,CEA,CYFRA21-1,NSE,and SCCA in the lung cancer group were significantly higher than those in the pulmonary tuberculosis group,showing statistically significant differences(P<0.05).Lasso and single-factor logistic regression analysis was conducted to identify the following characteristic variables for differentiating lung cancer from pulmonary tuberculosis:sex,age,CA125,CEA,CYFRA21-1,NSE,and SCCA.A random forest model was used to rank these variables by importance as follows:CYFRA21-1,CEA,SCCA,NSE,CA125,age,and sex.The results of Lasso-logistic regression analysis indicated that the levels of CYFRA21-1,CEA,NSE,and age were independent risk factors for differentiating lung cancer from pulmonary tuberculosis(P<0.05).The AUC,sensitivity,specificity,accuracy,and Youden index of the Random Forest model and the Lasso-logistic regression model for the differential diagnosis of lung cancer and pulmonary tuberculosis were 0.938,90.38%,87.50%,0.888,0.779 and 0.958,86.54%,92.19%,0.879,0.787,respectively.Conclusions The tumor markers CA125,CEA,CYFRA21-1,NSE,and SCCA hold significant clinical value in the differential diagnosis of lung cancer and pulmonary tuberculosis.The random forest model and Lasso-logistic regression model developed in this study can effectively discriminate between lung cancer and pulmonary tuberculosis.The Lasso-logistic regression model identified that the levels of CYFRA21-1,CEA,NSE,and age were independent risk for differentiating lung cancer from pulmonary tuberculosis.

关键词

机器学习/随机森林/鉴别诊断/肺癌/肺结核

Key words

machine learning/random forest/differential diagnosis/lung cancer/pulmonary tu-berculosis

分类

医药卫生

引用本文复制引用

周游,陈纪飞,何希,刘爱梅,杨小兵,黄一芳..运用机器学习构建肺癌与肺结核鉴别诊断模型[J].实用医学杂志,2026,42(7):1158-1164,7.

基金项目

广西科技重大专项(编号:桂科AA22096027) (编号:桂科AA22096027)

广西医疗卫生适宜技术开发与推广应用项目(编号:S2023050) (编号:S2023050)

实用医学杂志

1006-5725

访问量0
|
下载量0
段落导航相关论文