基于分类自动编码器的单细胞RNA测序数据降维方法scACOA北大核心CSTPCD
A dimensionality reduction algorithm scAC for single-cell RNA-seq data based on categorical autoencoders
单细胞RNA测序(Single-cell RNA sequencing,scRNA-seq)技术使研究人员可以在单细胞分辨率下测量转录组范围内的基因表达,并逐渐改变了人们对细胞生物学和人类疾病的认识.单细胞测序数据的高变异性、高稀疏性和高维度性严重阻碍了其下游分析,降维对于高维scRNA-seq数据的可视化和下游分析至关重要.然而,现有的单细胞降维算法没有充分考虑细胞之间的关系,也没有联合优化降维和聚类任务.为了克服这些局限性,面向单细胞RNA测序数据,以机器学习技术为手段,进行了基于自动编码器的降维算法研究.现有的降维算法大多没有使用伪标签来监督编码器的训练过程,导致降维数据的同时丢失了细胞间信号,提出了基于分类自动编码器的细胞降维算法.该算法结合了分类自动编码器和深度嵌入聚类来生成基因表达矩阵的低维表示.实验结果表明,与其他六种基准测试算法相比,该算法在一系列下游scRNA-seq分析任务中显示了具有竞争力的性能.
Single-cell RNA sequencing(scRNA-seq)technology enables researchers to measure gene expression across the transcriptome at single-cell resolution,progressively transforming our understanding of cell biology and human diseases.However,the high variability,sparsity,and dimensionality of single-cell sequencing data have significantly impeded downstream analysis,making dimensionality reduction crucial for the visualization and the subsequent analysis of high-dimensional scRNA-seq data.Yet,existing single-cell dimensionality reduction algorithms have not adequately considered relationships intercellular,nor have jointly optimized the tasks of dimensionality reduction and clustering.To overcome these limitations,this study focuses on scRNA-seq data and employs machine learning techniques to investigate a dimensionality reduction algorithm based on autoencoders.In light of the fact that most existing dimensionality reduction algorithms do not consider the use of pseudo-labels to supervise the training process of the encoder,leading to the loss of intercellular signals during the dimensionality reduction of data,this paper proposes a cell dimensionality reduction algorithm based on the classified autoencoder.The algorithm combines the classified autoencoder with deep embedded clustering to generate a low-dimensional representation of the gene expression matrix.Experimental results demonstrate that compared to six other benchmark testing algorithms,this algorithm exhibits competitive performance in a range of downstream scRNA-seq analysis tasks.
唐勇轩;梁潇;骆嘉伟
湖南大学信息科学与工程学院,长沙,410012
计算机与自动化
分类自动编码器细胞降维深度嵌入聚类单细胞RNA测序机器学习
classification autoencodercell dimension reductiondeep embedding clusteringscRNA-seqmachine learning
《南京大学学报(自然科学版)》 2024 (006)
920-929 / 10
国家自然科学基金(62032007,62372165)
评论