天津科技大学学报2025,Vol.40Issue(6):1-8,46,9.DOI:10.13364/j.issn.1672-6510.20240105
基于生成模型和基因表达数据的关键基因筛选
Key Gene Screening Based on Generative Models and Gene Expression Data
摘要
Abstract
Gene expression data can elucidate the pathological mechanisms of diseases under specific conditions and times.However,the"curse of dimensionality"phenomenon characterised by small samples and high dimensions,constrains the performance of traditional machine learning classification methods.This results in low prediction accuracy,an inability to recognise small samples,and poor stability.This article introduces a novel method,namely CVAE-CWGNA-DAE,which integrates data augmentation and gene selection in order to address the issues that arise from the"curse of dimensionality".Firstly,in order to address the issue of the small sample size in gene expression data,a data augmentation method is pro-posed,which combines a conditional variational autoencoder with a gradient penalty-based conditional Wasserstein genera-tive adversarial network.A comparison with existing methods demonstrates the superiority of this approach in terms of clas-sification performance and stability.Secondly,to address the high dimensionality in gene expression data and verify the ef-fectiveness of the generated data,this article employs a gene selection method based on a denoising autoencoder and SVM-RFE.The results reveal that the use of the augmented dataset for gene selection has resulted in an improvement in the accu-racy of selected genes across five distinct classification tasks.Therefore,these results demonstrate the effectiveness of the proposed method in addressing the"curse of dimensionality"and achieving significant improvements in gene selection.关键词
基因表达/维数灾难/数据增强/基因选择/自编码器/生成对抗网络Key words
gene expression/curse of dimensionality/data augmentation/gene selection/autoencoder/generative adversarial network分类
信息技术与安全科学引用本文复制引用
余钱,李雨蒙,罗军伟,董浩帆,李玉,吴信..基于生成模型和基因表达数据的关键基因筛选[J].天津科技大学学报,2025,40(6):1-8,46,9.基金项目
国家自然科学基金资助项目(62372156) (62372156)