中国肺癌杂志2025,Vol.28Issue(10):738-750,13.DOI:10.3779/j.issn.1009-3419.2025.102.36
可解释性深度学习算法在肺占位性病变良恶性诊断及肺癌病理亚型分类中的运用
Application of Explainable Deep Learning in Differentiating Benign from Malignant Pulmonary Space-occupying Lesions and Classifying Pathological Subtypes of Lung Cancer
摘要
Abstract
Background and objective The discrimination between benign and malignant pulmonary space-occu-pying lesions and the classification of pathological subtypes of lung cancer are critical for clinical decision-making.However,conventional methods often suffer from insufficient utilization of multi-source clinical data and poor interpretability of deep learning models.This study investigates the performance of interpretable deep learning algorithms in diagnosing benign versus malignant pulmonary space-occupying lesions and classifying pathological subtypes of lung cancer,using a hybrid architec-ture based on Tab-Transformer-designed for tabular data and Residual Multi-Layer Perceptron(ResMLP),referred to as TT-ResMLP.Methods Data including radiological characteristics,medical history,and laboratory findings from 345 patients with pathologically confirmed pulmonary space-occupying lesions were collected.The dataset was randomly split into a develop-ment set and a test set at an 8:2 ratio.Stable features were selected using the Spearman correlation test and the Least Absolute Shrinkage and Selection Operator(LASSO).The Synthetic Minority Over-sampling Technique(SMOTE)was employed to balance the samples,and 10-fold cross-validation was used to enhance model generalizability.Models were constructed using the Tab-Transformer algorithm,the ResMLP algorithm,and the TT-ResMLP hybrid.Model performance was evaluated using receiver operating characteristic(ROC)curves,the area under the curve(AUC),accuracy,specificity,sensitivity,and micro-averaged ROC(micro-ROC).SHapley Additive exPlanations(SHAP)analysis was performed based on the optimal model.Results In the benign vs malignant diagnosis task,all three models performed well.The Tab-Transformer model demon-strated the best performance on the test set,followed by TT-ResMLP and ResMLP.SHAP analysis of the top-performing Tab-Transformer model revealed that the feature importance ranking was:age,pleural indentation,thrombin time,mean density,and ground-glass opacity.Pleural indentation contributed substantially to malignant diagnosis,and its contribution was further enhanced with increasing age and decreasing thrombin time.In the lung cancer subtype classification task,all three models exhibited excellent performance,with the TT-ResMLP hybrid showing the best overall performance.SHAP analysis further revealed that the Lung Imaging Reporting and Data System(Lung-RADS)category held high importance across all three pathological subtypes.Male gender was positively associated with the prediction of squamous cell carcinoma.Neuron-specific enolase(NSE)played a significant role in predicting small cell carcinoma.For adenocarcinoma,the diagnostic probability was positively correlated with the Lung-RADS category,a relationship more pronounced at lower prothrombin time(PT)values.In contrast,a negative correlation was observed in the squamous cell carcinoma and small cell carcinoma subgroups,although gender and NSE levels could enhance their contributory risk prediction.Analysis of feature decision boundaries indicated that the Lung-RADS grade possessed high discriminative power for identifying adenocarcinoma,whereas NSE demonstrated stronger discriminative ability for identifying small cell carcinoma.Conclusion The TT-ResMLP hybrid architecture is effec-tive for diagnosing the benign or malignant nature of pulmonary space-occupying lesions and classifying pathological subtypes of lung cancer.The model possesses good interpretability,aiding in the identification of key predictive features and unravelling their interactive mechanisms,thereby providing an effective tool for a deeper understanding of lung cancer biology and clinical decision support.关键词
肺肿瘤/肺占位性病变/机器学习/特征诠释/良恶性诊断/深度学习Key words
Lung neoplasms/Pulmonary space-occupying lesion/Machine learning/Feature interpretation/Benign-malignant diagnosis/Deep learning引用本文复制引用
Haoran LI,Ya LI,Yuanyuan WANG,Yang WANG,Huihui HE,Junya LI,Yanning SU,Fanrui KONG,Xiangli LIU,Liuhui CHENG..可解释性深度学习算法在肺占位性病变良恶性诊断及肺癌病理亚型分类中的运用[J].中国肺癌杂志,2025,28(10):738-750,13.基金项目
本研究受河南省卫生健康委国家中医临床研究基地科研专项(No.2022JDZX062)、河南省中医药科学研究专项课题(No.2024ZY1004、No.2025ZY1007)及河南省卫生健康委国家中医药传承创新中心科研专项(No.2023ZXZX1038)资助 This study was supported by the grants from Special Research Project of the National Clinical Research Base for Tradi-tional Chinese Medicine,Henan Health Commission(No.2022JDZX062,to Yuanyuan WANG),Special Research Projects of Traditional Chinese Medicine in Henan Province(No.2024ZY1004,to Ya LI (No.2022JDZX062)
No.2025ZY1007,to Xiangli LIU),and Special Research Project of the National Traditional Chinese Medicine Inheritance and Innovation Center,Henan Health Commis-sion(No.2023ZXZX1038,to Yang WANG). (No.2023ZXZX1038,to Yang WANG)