生物信息学2026,Vol.24Issue(1):44-56,13.DOI:10.12113/202409001
机器学习结合生物信息学鉴定多发性硬化症的关键基因
Machine learning combined with bioinformatics identifies key genes in multiple sclerosis
摘要
Abstract
In order to explore the key genes of multiple sclerosis(MS)based on bioinformatics and machine learning methods;MS gene expression profiles GSE21942 and GSE32988 were obtained from the GEO database.GSE32988 was used as a validation dataset,Sample clustering was assessed using PCA to screen for differentially expressed genes(DEGs)and analyzed for GO and KEGG enrichment,the gene modules closely related to MS were identified using weighted gene co-expression network analysis(WGCNA),the intersection of the gene modules and DEGs was analyzed to obtain candidate genes.Candidate genes were screened to obtain potential key genes using machine learning algorithms,which include the Least Absolute Shrinkage Operator Algorithm(LASSO)and the Random Forest Algorithm(RF).The third-party dataset GSE32988 was used to validate the differential expression of potential key genes.key genes were obtained by performing subject operating characteristic curve(ROC)validation.An MS animal model was used to verify the expression levels of key genes;The results show that GSE21942 showed good repeatability and correlation,and a total of 506 DEGs were obtained.Enrichment analysis showed that DEGs were mainly enriched in biological functions such as B cell activation,glutamic acid(GLU)metabolism,oxidative stress(OS),as well as in EBV infection and the B cell receptor signaling pathway,etc.The 29 candidate genes were screened by a machine learning algorithm to obtain five potential key genes,and a total of four key genes,GLUD1,VDAC1,DDX3X,and LAMP1,were obtained after validation with GSE32988.RT-qPCR identified the expression levels of DDX3X,LAMP1,GLUD1,and VDAC1 in accordance with the results of bioinformatics analysis of mRNA microarrays;Consequently,DDX3X,LAMP1,GLUD1,and VDAC1 may become new targets for MS therapy.关键词
多发性硬化症/生物信息学/关键基因/机器学习Key words
Multiple sclerosis/Bioinformatics/Key genes/Machine learning分类
医药卫生引用本文复制引用
黄新蒙,苏凌昊,杨一帆,乔文慧,何家霖,赵培源,刘喜红..机器学习结合生物信息学鉴定多发性硬化症的关键基因[J].生物信息学,2026,24(1):44-56,13.基金项目
国家自然科学基金青年科学基金项目(No.82104579) (No.82104579)
中国博士后科学基金面上项目(No.2023M731024) (No.2023M731024)
河南省自然科学基金项目(No.202300410258) (No.202300410258)
河南省高等学校青年骨干教师培养计划项目(No.2023GGJS080). (No.2023GGJS080)