肿瘤预防与治疗2025,Vol.38Issue(4):312-321,10.DOI:10.3969/j.issn.1674-0904.2025.04.009
基于SEER数据库利用机器学习算法构建卵巢透明细胞癌预后模型
Prognostic Model for Ovarian Clear Cell Carcinoma Based on Machine Learning Algorithms Using the SEER Database
摘要
Abstract
Objective:This study aims to develop a prognostic model for ovarian clear cell carcinoma(OCCC)using ma-chine learning algorithms based on clinical and pathological data from the SEER database.The model's predictive perform-ance will be evaluated to provide evidence for clinical treatment and prognosis assessment in OCCC patients.Methods:This study analyzed comprehensive clinical and pathological data from 5,452 OCCC patients(2000~2019)in the SEER database to develop a prognostic model using multiple machine learning algorithms.The study inclusion criteria comprised:histologi-cally confirmed OCCC,complete clinicopathological data,and diagnosis between January 2000 and December 2019.Patients with missing baseline or follow-up data were excluded.The primary endpoint was clinical death.After data cleaning,1,091 eligible cases were included in the final analysis.Nine clinical variables were selected as input parameters,with patient mor-tality status serving as the output parameter.Statistical analyses were performed using Kaplan-Meier method for univariate survival analysis and Cox proportional hazards regression for multivariate analysis.Prognostic models were constructed emplo-ying logistic regression,decision trees,support vector machines,random forests,and artificial neural networks.The model's predictive performance was comprehensively evaluated using four key metrics:sensitivity,specificity,accuracy,and the area under the receiver operating characteristic curve(AUC).To address data imbalance,the dataset was randomly split into training and testing sets(8∶2 ratio).Data balancing was then performed using Synthetic Minority Oversampling Technique(SMOTE)for oversampling combined with random undersampling.Multivariate Cox analysis also identified signif-icant OCCC prognostic factors for building simplified models and comparative evaluation.Results:In the univariate Kaplan-Meier regression analysis,six factors demonstrated significant associations with patient survival outcomes:race(P=0.004),tumor laterality(P<0.001),tumor size(T stage)(P<0.001),lymph node metastasis(N stage)(P<0.001),distant metastasis(M stage)(P<0.001),and degree of differentiation(P=0.030).Multicollinearity was not observed among the six factors,as all variance inflation factors were below the threshold of 5.Multivariate Cox proportional hazards regression a-nalysis revealed that Black patients exhibited a significantly higher risk compared to White patients(HR=2.409,P<0.001);left-sided(HR=0.607,P<0.001)and right-sided tumors(HR=0.564,P=0.002)were associated with reduced risk relative to bilateral tumors;T2 stage(HR=3.060,P<0.001)and T3 stage(HR=4.721,P<0.001)correlated with increased risk compared to T1;N1 stage conferred elevated risk versus N0(HR=1.636,P<0.001);M1 stage was linked to higher risk compared to M0(HR=2.040,P<0.001).Among the five machine learning models evaluated,the random-forest model demonstrated the highest AUC values on both the training set(AUC=0.868)and the test set(AUC=0.762),sug-gesting its superior predictive capability compared to alternative algorithms.This model shows potential clinical utility for pre-dicting prognosis in OCCC patients.T stage emerged as the most influential prognostic factor in OCCC,achieving the highest feature importance scores across all 5 models.Conclusion:The random-forest model demonstrates robust prognostic predic-tive performance for OCCC patients,with T stage emerging as the most influential prognostic factor.关键词
卵巢透明细胞癌/机器学习算法/预后模型/SEER数据库Key words
Ovarian clear cell carcinoma/Machine learning algorithms/Prognostic model/SEER database分类
临床医学引用本文复制引用
张曼琳,马文馨,邱琳,刘光聪,杨卓..基于SEER数据库利用机器学习算法构建卵巢透明细胞癌预后模型[J].肿瘤预防与治疗,2025,38(4):312-321,10.基金项目
国家自然科学基金青年项目(编号:82103056) (编号:82103056)
辽宁省科技计划联合计划(技术攻关计划项目)(编号:2024JH2/102600176) (技术攻关计划项目)
辽宁省"兴辽英才计划"医学名家项目(编号:TXMJ-QN-06) This study was supported by National Natural Science Foundation of China(No.82103056),and by grants from Department of Science and Technology of Liaoning Province(No.2024JH2/102600176)and Health Commission of Liaoning Province(No.TXMJ-QN-06). (编号:TXMJ-QN-06)