现代电子技术2017,Vol.40Issue(12):6-10,5.DOI:10.16652/j.issn.1004-373x.2017.12.002
基于显露子串挖掘的基因序列模体识别算法
Motif identification algorithm for gene sequences based on emerging substrings mining
摘要
Abstract
Recently,the development of chromatin immunoprecipitation technique has extended the motif identification problem to the genome-wide range,but the traditional motif identification algorithms runs too slowly and hard to solve this large-scale data problem. In order to solve the shortcomings of the traditional algorithms,a substituted emerging substring search algo-rithm named FastESE applied to ChIP-seq data is proposed in this research. The emerging substrings are found out by comparing the test dataset and the control dataset,and then its substituted instances are searched to constitute the corresponding position probabilistic matrix. The weighted information content is adopted to cluster these substrings,and Finally,discover the substitut-ed emerging substrings. The effectiveness of proposed algorithm was verified with the real ChIP-seq data. The experimental re-sults show that the FastESE can deal with the motif identification problem in the ChIP-seq data in a proper time.关键词
染色质免疫共沉淀/显露子串/模体识别/FastESEKey words
chromatin immunoprecipitation/emerging substring/motif identification/FastESE分类
信息技术与安全科学引用本文复制引用
张懿璞,闫茂德,侯俊,阚丹会..基于显露子串挖掘的基因序列模体识别算法[J].现代电子技术,2017,40(12):6-10,5.基金项目
国家自然科学青年基金(61501058) (61501058)
陕西省自然科学青年基金(2016JQ6075) (2016JQ6075)
中央高校基本业务费(310832161008) (310832161008)