福建电脑2024,Vol.40Issue(7):58-62,5.DOI:10.16707/j.cnki.fjpc.2024.07.011
优化K-MER模型对生物序列进行聚类
Optimizing the K-mer Model for Clustering Biological Sequences
摘要
Abstract
The K-mer based biological sequence clustering algorithm is a clustering method based on sequence features,and pure K-mer clustering algorithms run slowly.To address this issue,this article proposes an optimized KMER model for clustering biological sequences.Firstly,based on the K-mer frequency of biological sequences,each character(A,C,G,T)is assigned a two bit binary number,and the K-mer index is constructed through bit operations.Then,the application process of the getKmer function is parallelized using Python's joblib library.Finally,sequence clustering is performed using the K-means algorithm.The experimental results demonstrate that,while ensuring accuracy,the optimized KMER model reduces the clustering time of biological sequences by more than half.关键词
生物序列/聚类算法/位操作/并行化Key words
Biological Sequences/Clustering Algorithm/Bit Operations/Parallelization分类
信息技术与安全科学引用本文复制引用
李莉,黄伟,赵佳旭..优化K-MER模型对生物序列进行聚类[J].福建电脑,2024,40(7):58-62,5.基金项目
本文得到福州职业技术学院校级科研项目(No.FZYKJJJYB202304)资助. (No.FZYKJJJYB202304)