| 注册
首页|期刊导航|清华大学学报(自然科学版)|人工智能时代的生物大分子结构数据库:共生、重塑与未来架构

人工智能时代的生物大分子结构数据库:共生、重塑与未来架构

杨睿韵 黄建华 张强锋

清华大学学报(自然科学版)2025,Vol.65Issue(12):2449-2463,15.
清华大学学报(自然科学版)2025,Vol.65Issue(12):2449-2463,15.DOI:10.16511/j.cnki.qhdxxb.2025.21.056

人工智能时代的生物大分子结构数据库:共生、重塑与未来架构

Biological macromolecular structure databases in the artificial intelligence era:Coevolution,transformation,and future architectures

杨睿韵 1黄建华 2张强锋1

作者信息

  • 1. 清华大学膜生物学全国重点实验室,北京 100084||清华大学北京生物结构前沿研究中心,北京 100084||清华大学生命科学学院,北京 100084||清华大学—北京大学生命科学联合中心,北京 100084
  • 2. 清华大学膜生物学全国重点实验室,北京 100084||清华大学自动化系,北京 100084
  • 折叠

摘要

Abstract

[Significance]Identifying the three-dimensional structures of biological macromolecules is fundamental to understanding the molecular basis of life and for the discovery of novel therapeutics.As the biological sciences enter the era of artificial intelligence(AI),structural data have become increasingly essential,while AI technologies simultaneously impose higher demands on data organization and management.This review traces the five-decade evolution of biological macromolecular structure databases,with a particular focus on the pivotal role of the Protein Data Bank(PDB).The PDB was established as a small archive of experimentally determined atomic coordinates but gradually developed into a global infrastructure that underpins structural biology.[Progress]We first chart the progression of structural data resources from early structure archives,which largely functioned as static catalogs of experimentally determined structures,to the emergence of highly curated functional classification systems,such as SCOP and CATH.These resources enable researchers to analyze structural relationships,investigate evolutionary patterns,and derive mechanistic insights.In parallel,sequence-centric databases—such as Pfam,InterPro,and later comprehensive domain-family resources—expanded by annotating conserved elements across the protein domain.Together,these efforts created a rich,multi-layered ecosystem in which the sequence,structure,and function of proteins became increasingly integrated,thereby turning structure databases into indispensable platforms for comparative analysis and mechanistic discovery.A new phase of structural data expansion began with AI-driven structure prediction.The release of the AlphaFold Protein Structure Database(AFDB),followed by complementary resources,including the ESM Atlas,induced an unprecedented expansion in structural coverage,spanning entire proteomes and previously challenging protein families.[Conclusions and Prospects]We propose that structural databases and AI models form a mutually reinforcing"double-helix"data model.High-quality experimental structures provide essential references for training and benchmarking predictive models,while large-scale AI-generated structures dramatically increase the amount of available data,thereby revealing new sequence-structure-function relationships,and enriching the databases themselves.This synergy would catalyze a paradigm shift in structural biology,transitioning the field from an experiment-led discipline to an integrated ecosystem in which computation and experimentation may coevolve.Despite rapid progress in this industry,major challenges persist.Structural databases remain affected by experimental sampling biases,uneven representation across organisms and protein families,and persistent inconsistencies in annotation quality.Moreover,the scarcity of dynamic and condition-dependent structural information further limits biological interpretability,particularly for intrinsically disordered regions,conformational ensembles,and transient complexes.Furthermore,AI-driven predictions introduce new concerns regarding model interpretability,calibration of confidence metrics,and the governance of large-scale predictive datasets.We anticipate that biological macromolecular structure databases will evolve from merely"AI-enhanced"to"AI-integrated"and,ultimately,adopt"AI-native"architectures.Such systems will incorporate a continuous feedback model,automated annotation pipelines,and multi-modal data fusion,thereby enabling them to function as reliable knowledge instruments capable of hosting biologically meaningful"digital twins."Collectively,these developments promise to enhance our understanding of structure-function relationships and accelerate rational design in protein engineering,drug discovery,and synthetic biology.As a result,structural databases will continue to underpin scientific innovation while defining a new research standard for biological sciences.

关键词

生物大分子结构数据库/人工智能(AI)/蛋白质结构预测/数据库生态系统/AI原生

Key words

biological macromolecular structure databases/artificial intelligence(AI)/protein structure prediction/database ecosystem/AI-native

分类

信息技术与安全科学

引用本文复制引用

杨睿韵,黄建华,张强锋..人工智能时代的生物大分子结构数据库:共生、重塑与未来架构[J].清华大学学报(自然科学版),2025,65(12):2449-2463,15.

基金项目

国家重点研发计划(2022YFF1203100) (2022YFF1203100)

国家自然科学基金重点项目(32230018) (32230018)

国家自然科学基金国家杰出青年科学基金项目(32125007) (32125007)

清华大学学报(自然科学版)

OA北大核心

1000-0054

访问量0
|
下载量0
段落导航相关论文