计算机工程2024,Vol.50Issue(1):296-305,10.DOI:10.19678/j.issn.1000-3428.0066550
基于多模态学习的乳腺癌生存预测研究
Research on Breast Cancer Survival Prediction Based on Multi-Modal Learning
摘要
Abstract
Breast cancer is one of the most common cancers.Predicting 5-year survival based on patient genomics data is a common task in breast cancer research.To address the problems of noise,heterogeneity,long sequences,and the imbalance of positive and negative samples in genomics data from breast cancer patients,a 5-year survival predic-tion MLBSP model for breast cancer prognosis based on multi-modal learning is proposed.The model uses a single-modal module to extract effective information from four modes of data:gene expression data,the cumulative number of gene mutations,single nucleotide variations,and copy number variations.To reduce the impact of the heterogeneity of single-mode data on global features,deep separable convolution and a multi-head self-attention mechanism are used as the multi-modal module architecture to fuse the data features,capture the global information of patients'multi-modal genome data,and use Focal Loss to solve the problem of the imbalance between positive and negative samples,to guide the 5-year survival prediction.The experimental results showed that the Area Under the Curve(AUC)of the MLBSP model for data from BRCA Cell,METABRIC,and PanCancer Altas,which are real data sets from breast can-cer patients,reached 91.18%,71.49%,and 77.37%,respectively.The AUC of the MLBSP model is 17.69%,6.51%,and 10.24%higher on average than the AUCs of XGBoost,random forest,and other mainstream cancer survival prediction methods,respectively.Pathway analysis identified some biomarkers,such as SLC8A3 and TP 53,further demonstrating the novelty and effectiveness of multi-modal research.关键词
乳腺癌/基因组学/深度学习/深度可分离卷积/多头自注意力/多模态学习Key words
breast cancer/genomics/deep learning/deep separable convolution/multi-head self-attention/multi-modal learning分类
信息技术与安全科学引用本文复制引用
曹广硕,黄瑞章,陈艳平,秦永彬..基于多模态学习的乳腺癌生存预测研究[J].计算机工程,2024,50(1):296-305,10.基金项目
国家自然科学基金(62066007). (62066007)