计算机与数字工程2025,Vol.53Issue(7):1789-1793,1799,6.DOI:10.3969/j.issn.1672-9722.2025.07.001
基于关键字树的生物基因多序列比对并行计算方法
Parallel Computation Method for Multi-sequence Alignment of Biological Genes Based on Keyword Tree
摘要
Abstract
In response to the square level time complexity problem of traditional star alignment algorithms in bioinformatics multi sequence alignment,a keyword tree algorithm is introduced to improve the star alignment algorithm.The biological informa-tion sequences are segmented and keyword trees for each subsequence are generated.The sliding window method is used to search for the sequence with the most perfectly matched base pairs with other sequences in the keyword tree of the sequence,and this se-quence is set as the central sequence,the central sequence is compared with other sequences to obtain the final multi sequence alignment result,and parallelized the improved star alignment algorithm is parallelied using Apache Hadoop Yarn to improve the speed of bioinformatics multi sequence alignment.Through experiments,it can be seen that the improved star alignment algorithm can significantly improve the speed of alignment during runtime,and Apache Hadoop Yarn parallelism results are excellent.On top of the improved star alignment algorithm,the sequence alignment time is further reduced through parallelization processing.关键词
关键字树/星比对算法/生物信息学/碱基对/多线程Key words
keyword tree/star comparison algorithm/bioinformatics/base pairm/multithreading分类
信息技术与安全科学引用本文复制引用
徐胜超..基于关键字树的生物基因多序列比对并行计算方法[J].计算机与数字工程,2025,53(7):1789-1793,1799,6.基金项目
国家自然科学基金面上项目(编号:61972444)资助. (编号:61972444)