| 注册
首页|期刊导航|生物技术通报|基于计算文献的大豆耦合性状知识发现研究

基于计算文献的大豆耦合性状知识发现研究

关陟昊 单治易 熊赫 赵瑞雪

生物技术通报2025,Vol.41Issue(9):345-356,12.
生物技术通报2025,Vol.41Issue(9):345-356,12.DOI:10.13560/j.cnki.biotech.bull.1985.2025-0082

基于计算文献的大豆耦合性状知识发现研究

Computational Literature-based Knowledge Discovery for Soybean Coupling Traits

关陟昊 1单治易 2熊赫 1赵瑞雪3

作者信息

  • 1. 中国农业科学院农业信息研究所,北京 100081
  • 2. 中国科学院文献情报中心,北京 100190||中国科学院大学经济与管理学院,北京 100190
  • 3. 中国农业科学院农业信息研究所,北京 100081||国家新闻出版署农业融合出版知识挖掘与知识服务重点实验室,北京 100081
  • 折叠

摘要

Abstract

[Objective]This study aims to develop an automated soybean trait discovery model using computational literature.The goal is to accurately identify soybean coupling traits before field trials and analyze their genetic networks.This approach addresses the limitations of traditional laboratory methods and offers a more efficient way to discover knowledge for soybean breeding research.[Method]Firstly,the annotation strategy of soybean corpus was constructed according to the authoritative domain ontology,and the soybean coupling trait knowledge was represented in the form of Subject-Predication-Object(SPO)semantic triples.Secondly,a semantic triplet extraction model was constructed based on the domain dictionary.Adapted positive-unlabeled learning(AdaPU)algorithm and R-BERT(pre-trained Transformer encoder for relation extraction)algorithm were used to automatically extract the knowledge entities and their regulatory relationships in soybean breeding literature,and the genetic regulatory networks of soybean traits were obtained.Finally,the coupling trait connected subnetworks and the reachable paths between trait nodes in the network were mined,and the literature review method was used to verify the results and analyze the genetic mechanism.[Result]Experimental results show that the knowledge extraction model achieved an accuracy of 79.41%,a recall rate of 88.52%,and an F1 score of 83.72%.A total of 776 unique soybean trait knowledge triples were identified,encompassing 33 gene concepts,119 protein concepts,and 96 trait concepts.Among these,478 triples represented"associated with"relationships,264"up-regulation"relationships,and 34"down-regulation"relationships.Within the soybean trait knowledge network,six coupling traits connected subgraphs were discovered,and 139 trait coupling paths within the largest connected subnetwork.[Conclusion]This study identified the feasibility of trait knowledge discovery based on large-scale literature.By deeply mining knowledge units,it uncovers the underlying coupling traits and their associated molecular mechanisms,providing plant breeding researchers with potential pleiotropic genes and coupling traits for experimental design,thereby enhancing the efficiency of hypothesis generation in scientific research.The structured trait knowledge generated by this study contributes to the development of knowledge graphs in the field of soybean breeding and serves as a reliable knowledge foundation for domain-specific large language models,facilitating the development and application of AI agent.

关键词

知识抽取/知识发现/语义三元组/大豆育种/耦合性状

Key words

knowledge extraction/knowledge discovery/SPO/soybean breeding/coupling traits

引用本文复制引用

关陟昊,单治易,熊赫,赵瑞雪..基于计算文献的大豆耦合性状知识发现研究[J].生物技术通报,2025,41(9):345-356,12.

基金项目

科技创新2030——新一代人工智能重大项目(2021ZD0113700) (2021ZD0113700)

生物技术通报

OA北大核心

1002-5464

访问量0
|
下载量0
段落导航相关论文