基于支持向量机预测C2H2型锌指蛋白OACSTPCD
Prediction of C2H2 zinc finger protein based on support vector machine
转录作为遗传信息传递的第一步,会受到多种转录因子的调控.转录因子(Transcription Factors,TF),是指能够在基因上游的特异核苷酸位点结合从而影响转录过程的蛋白质因子,锌指蛋白是数量最多的一类转录因子.由于锌指基序大多是不相同的,所以它们很可能结合不同的位点,行使多样化的调控功能.C2H2 型锌指蛋白是数量最多的一类锌指蛋白,本文构建C2H2 型锌指蛋白数据集,提取了氨基酸单肽组分信息、平均化学位移、氨基酸二肽组分信息三类特征信息,利用支持向量机算法对锌指蛋白进行预测,在Jackknife检验下最高预测成功率为 87.86%.之后对氨基酸二肽组分信息特征参数进行不同方式的降维处理,降维后最高预测成功率为 90.21%.最后对三类特征信息进行融合,融合特征后最高预测成功率为92.55%.对锌指蛋白进行预测,有助于更加深入地了解锌指蛋白的结构、功能和调控机理.
The first step in the transmission of genetic information is transcription,transcription is regulated by a variety of transcription factors.Transcription factors can bind to specific nucleotide sites upstream of genes and then influence the transcription process.The category with the largest number of transcription factors is zinc finger protein.Because zinc finger motifs in zinc finger protein are different,so they can bind to different sites and perform different regulatory process.The category with the largest number of zinc finger protein is C2H2 zinc finger protein.In this paper,the data set of C2H2 zinc finger protein is established,and based on the three types of feature information including amino acid composition,auto-covariance average chemical shift and dipeptide composition.The zinc finger protein is predicted by using the algorithm of support vector machine,and the accuracy is 87.86%in Jackknife.After that,different methods are used to reduce the dimension of dipeptide composition,and the accuracy is 90.21%after dimension reduction.Finally,multi-feature information is used to predict,and the accuracy is 92.55%.Prediction of zinc finger protein in order to better understand the structure,function and regulation mechanism.
刘哲;李凤敏
内蒙古农业大学 理学院,呼和浩特 010018
生物学
转录因子锌指蛋白特征信息预测
Transcription factorsZinc finger proteinFeature informationPrediction
《生物信息学》 2024 (002)
140-147 / 8
内蒙古自治区自然科学基金项目(No.2019MS03015).
评论