燕山大学学报2018,Vol.42Issue(1):59-66,74,9.DOI:10.3969/j.issn.1007⁃791X.2018.01.010
基于碱基组成和分布的DNA序列特征提取方法及应用
Feature extraction of DNA sequence based on the base composition and distribution and its applications
摘要
Abstract
To exploit some potential rules in biological information data based on the feature extraction is one of the basic problems in bioinformatics.The constructed 24-D feature vector is composed of base transition probabilities,base contents and base position ratios,and is applied to compare complete coding sequences of β-globin genes of 11 species and whole mitochondrial genomes of 18 eutherian mammals respectively.The derived phylogenetic trees are quite agreement with the evolutionary relationship.In addition, the essential genes of 28 bacteria are successfully identified by combining the feature vector and the support vector machine.The average AUC value is 0.808,much higher than some other methods.The results of experiments demonstrate that the proposed three characteristics are alternative classifiers in related bioinformatics research.关键词
转移概率/特征向量/系统发生树/必需基因/支持向量机Key words
transition probability/feature vector/phylogenetic tree/essential gene/support vector machine分类
生物科学引用本文复制引用
李玉双,魏东,吕艳芬..基于碱基组成和分布的DNA序列特征提取方法及应用[J].燕山大学学报,2018,42(1):59-66,74,9.基金项目
河北省高等学校青年拔尖人才计划资助项目(BJ2014060) (BJ2014060)
燕山大学"新锐工程"人才支持计划项目 ()