首页|期刊导航|燕山大学学报|基于碱基组成和分布的DNA序列特征提取方法及应用

基于碱基组成和分布的DNA序列特征提取方法及应用

李玉双魏东吕艳芬

燕山大学学报2018，Vol.42Issue(1)：59-66,74,9.

燕山大学学报2018，Vol.42Issue(1)：59-66,74,9.DOI:10.3969/j.issn.1007⁃791X.2018.01.010

基于碱基组成和分布的DNA序列特征提取方法及应用

Feature extraction of DNA sequence based on the base composition and distribution and its applications

李玉双 ¹魏东 ¹吕艳芬¹

作者信息

1. 燕山大学理学院,河北秦皇岛066004
折叠

摘要

Abstract

To exploit some potential rules in biological information data based on the feature extraction is one of the basic problems in bioinformatics.The constructed 24-D feature vector is composed of base transition probabilities,base contents and base position ratios,and is applied to compare complete coding sequences of β-globin genes of 11 species and whole mitochondrial genomes of 18 eutherian mammals respectively.The derived phylogenetic trees are quite agreement with the evolutionary relationship.In addition, the essential genes of 28 bacteria are successfully identified by combining the feature vector and the support vector machine.The average AUC value is 0.808,much higher than some other methods.The results of experiments demonstrate that the proposed three characteristics are alternative classifiers in related bioinformatics research.

关键词

转移概率/特征向量/系统发生树/必需基因/支持向量机

Key words

transition probability/feature vector/phylogenetic tree/essential gene/support vector machine

分类

生物科学

引用本文复制引用

李玉双,魏东,吕艳芬..基于碱基组成和分布的DNA序列特征提取方法及应用[J].燕山大学学报,2018,42(1):59-66,74,9.

基金项目

河北省高等学校青年拔尖人才计划资助项目(BJ2014060) （BJ2014060）

燕山大学"新锐工程"人才支持计划项目（）

燕山大学学报

OA北大核心CSTPCD

ISSN：1007-791X

访问量0

下载量0

段落导航