计算机工程与应用2012,Vol.48Issue(36):238-244,7.DOI:10.3778/j.issn.1002-8331.1106-0045
基于数据划分和集成的方法预测信号肽
Method based on data dividing and integration for predicting signal peptides
摘要
Abstract
As the length of signal peptide sequence is different and the composition of amino acid is diversified, most of existing methods in literature for signal peptides prediction employ scaling windows to deal with these problems, which lead to potential loss of useful information and imbalanced data problem. In order to improve the prediction performance of the class with minority samples, data preprocessing is used before employing traditional probabilistic neural networks to build classifiers: the class with majority samples is divided into several groups, and then several data subsets are respectively constituted by combining each group with minority samples, which are used to train probabilistic neural network classifiers. The ensemble system finally combines results through ballot from a series of classifiers worked on two different coding of proteins sequences. The experiments carried out on the popular Neilsen dataset show the effectiveness of the proposed algorithm.关键词
信号肽预测/不平衡数据集/聚类划分/概率神经网络/多分类器融合Key words
signal peptides prediction/ imbalanced data sets/ clustering dividing/ probabilistic neural networks/multiple classifiers combination分类
信息技术与安全科学引用本文复制引用
王怡,郭躬德,孔祥增..基于数据划分和集成的方法预测信号肽[J].计算机工程与应用,2012,48(36):238-244,7.基金项目
国家自然科学基金(No.61070062) (No.61070062)
福建高校产学合作科技重大项目(No.2010H6007). (No.2010H6007)