生物信息学2025,Vol.23Issue(2):88-95,8.DOI:10.12113/202403010
基于大规模人群变异的中国人参考基因组构建方法
A method for Chinese-specific reference genome construction based on large-scale population genomic variants
摘要
Abstract
Genomic variation is at the core of genetic diversity and has a significant impact on the analysis of evolution,the revelation of individual differences within a species,and the investigation of disease mechanisms.The ability to characterize the reference genome sequence is crucial for genetic research as it directly affects the accurate identification of genetic variants.Currently,the human reference genome is based on samples from Western populations,which may not accurately represent the genomic variants in Chinese populations.Therefore,constructing a new reference genome with Chinese genetic characteristics is urgently needed to facilitate in-depth research on the genetic and evolutionary mechanisms of Chinese populations.The objective of this study is to propose a method for modifying the GRCh38 version of the human reference genome based on population genomic variants.This method employs three types of East Asian population variants:single nucleotide variants(SNVs),short insertion-deletion variants(Indels),and structural variants(SVs).After screening and revisions,a series of Chinese reference genomes with different allele frequencies and variant types were established.Sequencing data from various regions in China were used to benchmark the modified Chinese reference genomes.The reference genome,which respectively selected high-frequency SVs,Indels,and SNVs from East Asian populations with frequencies of over 2/3,1/2,and 1/2,achieved the best read mapping results.The optimal Chinese reference genome was obtained by incorporating all the above variants into GRCh38.The Chinese reference genome established in this study is expected to enhance the ability to identify large-scale genome variants in the Chinese population and provide an effective method for subsequent Chinese reference genome construction.Further details on the methodology can be found at:https://github.com/azheasir/Chinese-specific-reference-genome-construction.关键词
大规模人群/参考基因组/基因组序列比对Key words
Large-scale population/Reference genome/Sequencing read alignment分类
计算机与自动化引用本文复制引用
吕俊增,曹舒淇,姜涛..基于大规模人群变异的中国人参考基因组构建方法[J].生物信息学,2025,23(2):88-95,8.基金项目
黑龙江省自然科学基金项目(No.LH2023F014) (No.LH2023F014)
国家自然科学基金项目(No.62472120). (No.62472120)