微生物宏蛋白质组——从样品处理、数据采集到数据分析OA北大核心CSTPCDMEDLINE
Microbial metaproteomics—From sample processing to data acquisition and analysis
微生物与人体疾病、健康密切相关,如何理解微生物群落的组成及其发挥的功能是一大亟需研究的问题.近年来,宏蛋白质组学已经成为研究微生物组成与功能的重要技术手段.然而,由于微生物群落样本的复杂性与高度异质性,样品处理、质谱数据采集与数据分析成为宏蛋白质组目前面临的三大挑战.在宏蛋白质组分析中往往需要针对不同类型的样品进行前处理优化,采取不同的微生物分离富集、提取和裂解方案.与单一物种蛋白质组相类似,宏蛋白质组学中的质谱数据采集模式有数据依赖性采集(data-dependent acquisition,DDA)模式和数据非依赖性采集(data-independent acquisition,DIA)模式.DIA数据采集模式可以完整地采集样品的肽段信息,具有很强的发展潜力.但是由于宏蛋白质组样品的复杂性,其DIA数据解析已成为阻碍宏蛋白质组深度覆盖的一大难题.在数据解析方面,最重要的步骤在于蛋白质序列数据库的构建.数据库的大小和完整性不仅对鉴定数量有很大影响,还会影响物种和功能水平上的分析.目前宏蛋白质组数据库构建的金标准是基于宏基因组的蛋白质序列数据库.同时,基于迭代搜库的公共数据库过滤方法也已被证明具有很强的实用价值.从具体的数据解析策略角度,以肽段为中心的DIA数据解析方法占据了绝对的主流.随着深度学习和人工智能的发展,其会极大地推动宏蛋白质组数据解析的准确度、覆盖度与分析速度.在下游生物信息学分析方面,近年来开发了一系列注释工具,可以在蛋白水平、肽段水平、基因水平上进行物种注释来获得微生物群落组成.与其他组学方法相比,微生物群落的功能分析是宏蛋白质组学的一个独特特征.宏蛋白质组已经成为微生物群落多组学分析中的重要组成部分,并且仍在覆盖深度、检测灵敏度、数据解析完整度等方面具有很大的发展潜力.
Microorganisms are closely associated with human diseases and health.Understand-ing the composition and function of microbial communities requires extensive research.Metap-roteomics has recently become an important method for throughout and in-depth study of mi-croorganisms.However,major challenges in terms of sample processing,mass spectrometric data acquisition,and data analysis limit the development of metaproteomics owing to the com-plexity and high heterogeneity of microbial community samples.In metaproteomic analysis,op-timizing the preprocessing method for different types of samples and adopting different microbi-al isolation,enrichment,extraction,and lysis schemes are often necessary.Similar to those for single-species proteomics,the mass spectrometric data acquisition modes for metaproteomics include data-dependent acquisition(DDA)and data-independent acquisition(DIA).DIA can collect comprehensive peptide information from a sample and holds great potential for future development.However,data analysis for DIA is challenged by the complexity of metaproteome samples,which hinders the deeper coverage of metaproteomes.The most important step in da-ta analysis is the construction of a protein sequence database.The size and completeness of the database strongly influence not only the number of identifications,but also analyses at the spe-cies and functional levels.The current gold standard for metaproteome database construction is the metagenomic sequencing-based protein sequence database.A public database-filtering meth-od based on an iterative database search has been proven to have strong practical value.The peptide-centric DIA data analysis method is a mainstream data analysis strategy.The develop-ment of deep learning and artificial intelligence will greatly promote the accuracy,coverage,and speed of metaproteomic analysis.In terms of downstream bioinformatics analysis,a series of annotation tools that can perform species annotation at the protein,peptide,and gene levels has been developed in recent years to determine the composition of microbial communities.The functional analysis of microbial communities is a unique feature of metaproteomics compared with other omics approaches.Metaproteomics has become an important component of the multi-omics analysis of microbial communities,and has great development potential in terms of depth of coverage,sensitivity of detection,and completeness of data analysis.
吴恩慧;乔亮
复旦大学化学系,上海 200433
化学
宏蛋白质组学样品前处理数据库数据分析策略
metaproteomicssample pretreatmentdatabasedata analysis strategy
《色谱》 2024 (007)
658-668 / 11
国家自然科学基金(22374031).National Natural Science Foundation of China(No.22374031).
评论