摘要
Abstract
Cell type annotation is an essential task for single-cell RNA sequencing(scRNA-Seq)analysis.In this paper,we propose a tool for cell type identification of scRNA-Seq data,which is based on the Transformer,in order to overcome the performance degradation and high computational complexity that occur when dealing with sparse data.The model includes four modules,which are Gene Embedding,Position Encoding,Transformer Encoder and Classification Layer;The Gene Embedding process processes K Highly Variants Genes(HVGs)(K=2 000)into N sub-vectors;Unassigned rate,F1 score,accuracy,kappa score and AUR metrics are used as evaluation criteria to systematically assess the performance of the model and the other nine tools.The results show that within the dataset,scTransformer achieve 96.59%accuracy,which is higher than the other tools,and the unassigned rate reaches 0.18%.Probably due to the imbalance of the samples,its average F1 score is 93.46%,which is lower than that of CHETAH,Clustifyr and SciBet;In the cross-platform same-organisation inter-tissue test and the completely different-organisation inter-tissue test(pancreas,blood),scTransformer has the best accuracy,F1 score and kappa coefficient(>0.99);In mouse brain,pancreas and lung tissues,scTransformer's AUR and unassigned rate are second only to those of the Seurat tool and the Clustifyr tool.scTransformer source code and data are available at https://github.com/nanjingyuanbao/scTransformer.In conclusion,this paper presents and systematically evaluates a new Transformer-based cell type annotation tool.关键词
细胞类型注释/变换器模型/单细胞RNA测序Key words
Cell type annotation/Transformer/scRNA-seq分类
生物工程