基于狄利克雷多项式过程模型与K-means结合的菌群分析OACSTPCD
Flora analysis based on Dirichlet polynomial process model and K-means
群体分型是一种有助于更好的理解人类身心健康等复杂生物学问题的有效方法,聚类是一种为了对样本分组来降低复杂性的定义肠型的方法,而传统K-means聚类算法的K值选取无法确定,本文在传统K-means聚类算法的基础上进行了改进,并公开数据集上进行了验证,实验表明改进算法能够解决K值选取无法确定的问题,且聚类结果的稳定性、准确性和聚类质量都得到显著提高.将改进后的模型运用于肠道菌群OTUs数据,发现不仅能够有效地区分 2-型糖尿病患者样本间的相似性,而且能鉴定出影响菌群结构异质性最大的OTUs菌,为临床解决 2-型糖尿病问题提供了一种新的思路.
Population typing is an effective method to better understand complex biological problems such as human physical and mental health.Clustering is a method to define intestinal type in order to reduce complexity by grouping samples.However,the selection of K value of traditional K-means clustering algorithm cannot be determined.This paper improves the traditional K-means clustering algorithm and verifies it on the public dataset,The experimental results show that the improved algorithm can solve the problem of undetermined K value selection,and the stability,accuracy and quality of clustering results are significantly improved.Applying the improved model to the OTUs data of intestinal flora,it is found that it can not only effectively distinguish the similarities between samples of patients with type 2 diabetes,but also identify the OTUs bacteria that have the greatest impact on the heterogeneity of flora structure,providing a new perspective for clinical solutions to the problem of type 2 diabetes.
彭显;贺建峰
昆明理工大学 信息工程及其自动化学院,昆明 650000
计算机与自动化
K-means算法狄利克雷过程混合模型菌群分析群体分型聚类
K-means algorithmDirichlet process mixed modelFlora analysisPopulation typingClustering
《生物信息学》 2024 (001)
47-57 / 11
评论