电子学报Issue(5):1007-1013,7.DOI:10.3969/j.issn.0372-2112.2015.05.026
基于码书索引变换的高通量DNA序列数据压缩算法
High-Throughput DNA Sequence Data Compression Method Based on Codebook Index Transformation
摘要
Abstract
A novel high-throughput DNA sequence compression method based on codebook index transformation (CITD) is proposed .In CITD ,we used the codebook index transformation (CIT ) model ,to substitute the traditional represatation of codebook indexes by the quaternary values which are expressed by the four standard base characters ,and adopted a simple encoding method to distinguish the replaced and non-replaced substring ,and subsequently determined whether need to use the Burrow Wheeler Transfor-mation (BWT ) according to the value of information entropy ,finally used move to front (MTF ) transformation and Huffman en-tropy coding to compress the data .Experimental results on several sequencing data sets demonstrate better performance of CITD than the high-throughput DNA sequence compression algorithms cited in this paper ,in most cases .关键词
高通量DNA序列/码书索引变换模型/块排序压缩变换/前移编码/信息熵/数据压缩算法Key words
high-throughput DNA sequence/codebook index transformation (CIT )model/burrow wheeler transfarmation (BWT)/move to front(MTF)/information entropy/data compression algorithm分类
信息技术与安全科学引用本文复制引用
谭丽,孙季丰..基于码书索引变换的高通量DNA序列数据压缩算法[J].电子学报,2015,(5):1007-1013,7.基金项目
国家自然科学基金青年科学基金(No .61202292);广东省自然科学基金 ()