基于个性化随机游走的基因-表型关联分析OA北大核心CSTPCD
Individual Random Walks for Gene-Phenotype Association Analysis
基因与表型间的关联分析对揭示生物的内在遗传关联具有重要意义.随机游走算法可以融合多组学数据,聚合一阶或高阶邻居的标签信息,对网络中不同节点间关联信息进行补全,提高关联预测的准确度,进而发现基因和表型间潜在的遗传关联.但现有随机游走算法通常平等地对待每个节点,忽略了不同节点的重要性,使非重要节点过度传播,降低了模型性能.为此,本文提出了一种基于多组学数据融合的个性化随机游走算法(indi-vidual Multiple Random Walks,iMRW),在由基因、miRNA及表型节点构建的多组学异质网络上,基于网络拓扑结构,设计个性化多元随机游走策略,为不同重要程度的节点分配不同的游走步长,并结合高斯相互作用属性核相似性与随机游走,对网络不同节点及节点间关联信息进行补全,最终实现多源基因-表型关联矩阵的融合,准确获取基因-表型关联预测矩阵.在不同实验设置下,与主流算法的对比实验结果均显示iMRW能够取得更优的预测性能.在玉米光合作用能力和淀粉含量表型的实验分析结果也进一步证实了iMRW在识别潜在的基因-表型关联的实用性与有效性.
Association analysis between genes and phenotypes is crucial to reveal the inherent genetic association of organisms.Random walk-based algorithms can fuse multiple omics data,aggregate the label information of first-order or higher-order neighbors,complete the association information between different nodes in the network,improve the accuracy of association prediction and further discover the potential genetic associations between genes and phenotypes.However,existing random walk algorithms usually treat each node equally and ignore the varying importance of different nodes,as such non-important nodes can be excessively propagated and the model performance is compromised.To this end,an indi-vidual multiple random walks(iMRW)algorithm based on multi-omics data fusion is proposed.On the heterogeneous ge-netic network composed with genes,miRNAs and phenotype nodes,we design the individual multiple random walks strate-gy based on the network topology,assign nodes of different importance with different walking lengths.We then complete the genetic information of different nodes by fusing multi-source association matrix,Gaussian interaction profile kernel sim-ilarity and random walk,and accurately obtain the gene-phenotype association prediction matrix.Under different experi-mental settings,iMRW can achieve the best prediction performance compared with the state-of-the-art algorithms.The case study with respect to maize photosynthetic ability and starch content further confirm the usefulness and effectiveness of iMRW in identifying potential gene-phenotype associations.
谭好江;王峻;余国先;陈建;郭茂祖
山东大学软件学院,山东济南 250101||山东大学人工智能国际联合研究院,山东济南 250101山东大学人工智能国际联合研究院,山东济南 250101山东大学软件学院,山东济南 250101中国农业大学农学院,北京 100083北京建筑大学电气与信息工程学院,北京 100044
计算机与自动化
基因-表型关联随机游走异质网络多组学数据融合网络拓扑结构
gene-phenotype associationsrandom walkheterogeneous networkmulti-omics data fusionnetwork topology
《电子学报》 2024 (005)
1619-1632 / 14
国家自然科学基金(No.62031003,No.62072380);山东大学中央高校基本业务费(No.2020GN061) National Natural Science Foundation of China(No.62031003,No.62072380);Fundamental Re-search Funds of Shandong University(No.2020GN061)
评论