首页|期刊导航|电子学报|基于生物信息学特征的DNA序列数据压缩算法

基于生物信息学特征的DNA序列数据压缩算法

纪震周家锐朱泽轩 Q H Wu

电子学报2011，Vol.39Issue(5)：991-995,5.

基于生物信息学特征的DNA序列数据压缩算法

Bioinformatics Features Based DNA Sequence Data Compression Algorithm

纪震 ¹周家锐 ²朱泽轩 ¹Q H Wu³

作者信息

1. 深圳大学计算机与软件学院,广东深圳,518060
2. 浙江大学生物医学工程与仪器科学院,浙江杭州,310027
3. 利物浦大学电气电子工程系,利物浦,L69 3GJ,UK
折叠

摘要

Abstract

A novel bioinformatics features based DNA Sequence data compression algorithm of BioLZMA is proposed in this paper. In BioLZMA,the DNA sequence data is sliced and reformed into 4 clusters according with biological meanings:the coding sequence cluster, the intron cluster, the RNA cluster and the residual cluster. By employing pointed compression strategies in data pre-processing,the clusters are compressed separately with LZMA algorithm. Experimental results demonstrated the better performance of BioLZMA than original DNA compression algorithms on benchmark sequences. Especially on long DNA sequence with significant bioinformatics features, BioLZMA algorithm can achieve higher compression ratio with little computation lime.

关键词

DNA数据压缩/生物信息学/序列重组/近似重复片段/LZMA

Key words

DNA sequence data compression/ bioinformatics/ Sequence regroup/ approximate repeat fragment/ Lempel-Ziv-Markov crain algorithm (LZMA)

分类

信息技术与安全科学

引用本文复制引用

纪震,周家锐,朱泽轩,Q H Wu..基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,39(5):991-995,5.

基金项目

国家自然科学基金(No.60872125) （No.60872125）

霍英东教育基金会高等院校青年教师基金基础性研究课题（）

深圳市基础研究项目(杰青奖) （杰青奖）

广东省自然科学基金（）

电子学报

OA北大核心CSCDCSTPCD

ISSN：0372-2112

访问量0

下载量0

段落导航