| 注册
首页|期刊导航|辽宁大学学报(自然科学版)|基于刺突蛋白序列和机器学习方法预测冠状病毒宿主多分类

基于刺突蛋白序列和机器学习方法预测冠状病毒宿主多分类

赵健 王治博 谢翟 张力 刘宏生

辽宁大学学报(自然科学版)2023,Vol.50Issue(4):312-317,6.
辽宁大学学报(自然科学版)2023,Vol.50Issue(4):312-317,6.

基于刺突蛋白序列和机器学习方法预测冠状病毒宿主多分类

Multi-Classification Prediction of Coronavirus Hosts Based on Spike Protein Sequences and Machine Learning Methods

赵健 1王治博 1谢翟 1张力 1刘宏生2

作者信息

  • 1. 辽宁大学 生命科学院,辽宁 沈阳 110036
  • 2. 辽宁大学 药学院,辽宁 沈阳 110036
  • 折叠

摘要

Abstract

Severe acute respiratory syndrome coronavirus 2(SARS-COV-2)caused a global pandemic of COVID-19 in late 2019,with the coronavirus jumping species to multiple mammals,including humans.Rapid and accurate prediction of coronavirus host classification is of great significance for future epidemic control and prevention.In this study,spike protein sequences were collected from the NCBI(National center for biotechnology information)virus database.Using CD-HIT software to remove repeated data,3 216 sequences were obtained,which were divided into 6 samples according to host classification.Sorted by collection time,they were divided into training set and test set in 8∶2 ratio.Distribution descriptor(CTDD)and natural language model Seq2Vec were used to encode the characteristics of spike protein sequence.A variety of machine learning methods are used to train and evaluate predictive classification models.As the best model,Seq2Vec-GCNN has an accuracy of 99.37%in predicting human hosts,while CTDD-RF has an excellent performance in predicting other host classification,with an accuracy of 95.82%for swine,95.96%for avian,98.33%for camels,92.06%for bats and 94.01%for other mammals.The results show that it is practical and effective to use machine learning methods to construct predictive coronavirus host classification models based on spike protein sequences.

关键词

机器学习/冠状病毒/刺突蛋白

Key words

machine learning/coronavirus/spike protein

分类

信息技术与安全科学

引用本文复制引用

赵健,王治博,谢翟,张力,刘宏生..基于刺突蛋白序列和机器学习方法预测冠状病毒宿主多分类[J].辽宁大学学报(自然科学版),2023,50(4):312-317,6.

基金项目

沈阳市中青年科技创新人才支持计划项目(RC210216) (RC210216)

国家自然科学基金青年科学基金项目(82003655) (82003655)

辽宁省教育厅面上项目(LJKZ0088) (LJKZ0088)

辽宁省"兴辽英才计划"项目(XLYC2002045) (XLYC2002045)

辽宁省重点研发计划项目(2019JH2/10300041) (2019JH2/10300041)

辽宁大学学报(自然科学版)

1000-5846

访问量4
|
下载量0
段落导航相关论文