计算机与数字工程2025,Vol.53Issue(11):3174-3178,3219,6.DOI:10.3969/j.issn.1672-9722.2025.11.032
基于降维组合和聚类的新冠病毒基因序列分析
Gene Sequence Analysis of COVID-19 Based on Dimension Reduction Combination and Clustering
摘要
Abstract
In order to deal with the COVID-19 that is still raging,many researchers analyze the gene sequence of COV-ID-19,and cluster analysis is an effective means of analysis.This paper analyzes the gene sequence data of COVID-19 with cluster-ing algorithm as the core,and uses a combination of stepwise dimension reduction and clustering to solve the problem that high-di-mensional gene sequence mutation data is not suitable for direct clustering.By using two common datasets to train several commonly used algorithms after dimension reduction and clustering combination,the best combination of PCA+UMAP dimension reduction and Birch clustering algorithm is selected.Finally,the mutation data of COVID-19 gene sequence is input into the algorithm model for classification.At the same time,further analysis also confirmed that the mutation frequency of S protein of COVID-19 is very high,and S protein is closely related to the high infectivity of the virus,which is consistent with the conclusions reached by relevant researchers.关键词
新冠病毒/降维算法/聚类算法/算法组合Key words
novel coronavirus/dimension reduction algorithm/clustering algorithm/algorithm combination分类
信息技术与安全科学引用本文复制引用
汪政林,郑茜颖,陈建森,郑巧..基于降维组合和聚类的新冠病毒基因序列分析[J].计算机与数字工程,2025,53(11):3174-3178,3219,6.基金项目
福建省科技重点产业引导项目(编号:2020H0007) (编号:2020H0007)
福建医科大学科研攻关应急项目(编号:2020XJ005)资助. (编号:2020XJ005)