计算机应用与软件2017,Vol.34Issue(4):22-27,98,7.DOI:10.3969/j.issn.1000-386x.2017.04.005
基于高通量测序的短序列生物数据压缩研究
RESEARCH ON DATA COMPRESSION OF SHORT-SEQUENCE BIOLOGICAL DATA BASED ON NEXT-GENERATION SEQUENCING
孟倩1
作者信息
- 1. 复旦大学计算机科学技术学院 上海 200433
- 折叠
摘要
Abstract
Due to the development of next-generation sequencing technology (NGS), the rapid growth of sequential data has brought a heavy pressure to data storage and transmission.The data compression technique is an important method to solve this problem, but traditional compression methods do not exploit the characteristics of the data well.Therefore, scholars begin to focus on the compression algorithm which is the special one for NGS data.In this paper, we present a comprehensive summary of compression algorithms for the Fastq and Fasta data obtained from NGS.We introduce the features of Fastq and Fasta, and summarize the commonly used methods of sequential data compression.Then we evaluate these representative compression tools through tests on several data sets from various scales, species and sequencing platforms, in order to compare the compression performance and validate the characteristics so that they can support researchers as a guide for algorithm selection and improvement.Finally, some problems and the trends of short-sequence data compression algorithms are also proposed in this paper.关键词
数据压缩/短序列数据压缩/高通量测序Key words
Data compression/Short-sequence data compression/Next-generation sequencing分类
信息技术与安全科学引用本文复制引用
孟倩..基于高通量测序的短序列生物数据压缩研究[J].计算机应用与软件,2017,34(4):22-27,98,7.