军事医学2024,Vol.48Issue(3):213-218,6.DOI:10.7644/j.issn.1674-9960.2024.03.008
一种基于卷积神经网络的大肠杆菌和志贺菌基因组鉴别方法
A convolutional neural network-based method for differentiating between Escherichia coli and Shigella genomes
摘要
Abstract
Objective To differentiate between highly genetically similar bacteria,such as Escherichia coli and Shigella spp.using deep learning techniques in order to contribute to clinical diagnosis and epidemic prevention.Methods A convolutional neural network(CNN)was proposed based on transfer learning with a large-scale pre-trained protein language model,which could enable rapid and accurate identification of bacterial strains at the genus level.To validate the reliability of this model,whole-genome data on related bacteria was retrieved from the National Center for Biotechnology Information(NCBI)in the United States before the full-genome protein sequences of highly genetically similar strains of Escherichia coli and Shigella spp.were selected as experimental samples.Results With this method,genus-level classification accuracies of 97.13%and 95.56%were made available respectively during classification experiments on 2960 strains with high assembly quality and 4945 strains with low assembly quality,which outperformed the other methods currently available.Conclusion This study demonstrates the reliability and potential of deep learning-based methods for differentiation of bacterial types.By integrating self-supervised pre-training techniques with transfer learning,this approach can capture high-dimensional feature differences that are not easily discernible or statistically analyzable by humans.Furthermore,this method exhibits broad applicability,as it requires lower assembly completeness of the bacterial genome sequences used.关键词
大肠杆菌/志贺菌/细菌鉴别/全基因组蛋白/卷积神经网络Key words
Escherichia coli/Shigella/bacterial identification/whole genome protein/convolutional neural network分类
医药卫生引用本文复制引用
孟人杰,罗楠,靳远,岳俊杰,王博千,高沅铭..一种基于卷积神经网络的大肠杆菌和志贺菌基因组鉴别方法[J].军事医学,2024,48(3):213-218,6.基金项目
国家自然科学基金资助项目(82003519,32070025,62102439) (82003519,32070025,62102439)
病原微生物生物安全国家重点实验室研究项目(SKLPBS1807,SKLPBS2214) (SKLPBS1807,SKLPBS2214)