华中师范大学学报(自然科学版)2026,Vol.60Issue(2):308-320,13.DOI:10.19603/j.cnki.1000-1190.2026.02.013
基于机器学习和氨基酸位置相关系数法的HPV进化关系和亚型分类研究
Evolutionary relationships and genotyping of HPV based on machine learning and amino acid position correlation coefficient method
摘要
Abstract
In this study,a non-sequence-alignment method based on amino acid positional information,namely the amino acid correlation coefficient feature vector(ACCFV)method,was proposed for evolutionary analysis and genotyping of human papillomavirus(HPV).Traditional multiple sequence alignment(MSA)methods suffer from low computational efficiency and high memory consumption when processing large-scale datasets.In contrast,the ACCFV method overcomes these limitations by constructing statistical measures of positional correlations between amino acids and converting amino acid sequences into numerical feature vectors.Amino acid sequences of eight HPV proteins(E6,E7,E1,E2,E4,E5,L1,and L2)were selected as target data.After feature extraction using ACCFV,a phylogenetic tree was constructed based on Euclidean distances between feature vectors,and four machine learning models were employed for classification prediction.The results showed that when the delay step size L=1,the ACCFV method achieved high consistency with the traditional MSA tool Muscle in evolutionary analysis,while significantly improving computational efficiency.Moreover,the Random Forest model achieved 100%classification accuracy.Compared to BLAST-Protein,ACCFV maintained 100%accuracy while substantially reducing processing time and required no batch operations.This study not only validates the feasibility and effectiveness of the ACCFV method in HPV research but also provides a novel technical approach for molecular epidemiological studies of other viruses.关键词
HPV/氨基酸序列/机器学习/进化分析/亚型分类Key words
HPV/amino acid sequence/machine learning/evolutionary analysis/subtype classification分类
生物科学引用本文复制引用
胡画霖,何黎黎,刘茂省..基于机器学习和氨基酸位置相关系数法的HPV进化关系和亚型分类研究[J].华中师范大学学报(自然科学版),2026,60(2):308-320,13.基金项目
国家自然科学基金项目(12571522) (12571522)
北京建筑大学高层次人才引进资助计划项目(GDRC20220802) (GDRC20220802)
2024年度北京市数字教育研究课题(青年课题)(BDEC2024QN081) (青年课题)
北京市教育委员会2024年度科研计划一般项目(KM202410016001) (KM202410016001)
2024年北京市高等教育学会课题(MS2024130). (MS2024130)