| 注册
首页|期刊导航|计算机工程|基于MapReduce的拷贝数变异测序数据并行处理方案

基于MapReduce的拷贝数变异测序数据并行处理方案

何亨 程凯莉 张葵 成淑君

计算机工程2025,Vol.51Issue(5):177-187,11.
计算机工程2025,Vol.51Issue(5):177-187,11.DOI:10.19678/j.issn.1000-3428.0068749

基于MapReduce的拷贝数变异测序数据并行处理方案

Parallel Processing Scheme for Sequencing Data in Copy Number Variation Based on MapReduce

何亨 1程凯莉 1张葵 1成淑君2

作者信息

  • 1. 武汉科技大学计算机科学与技术学院,湖北武汉 430065||湖北省智能信息处理与实时工业系统重点实验室,湖北武汉 430065
  • 2. 北京邮电大学计算机学院,北京 100876
  • 折叠

摘要

Abstract

Copy Number Variation(CNV)is a type of genetic variation that widely occurs in the gene distribution of the human genome.Improving the efficiency of CNV detection can provide patients with more rapid and accurate results,significantly reduce medical costs,and facilitate drug development and clinical applications.Currently,a method based on Read Depth(RD)is the most commonly used method for CNV detection,and the processing time for RD-related information is long,accounting for the relatively high CNV detection time.Existing methods have problems,such as ineffective application in whole-genome analysis,low computational efficiency,and decreased detection accuracy.This paper proposes an efficient parallel processing scheme for sequencing data for copy number variation detection EPPCNV.In EPPCNV,two MapReduce jobs are designed to achieve efficient parallel processing of whole-genome sequencing data and accurately extract RD-related information.Moreover,EPPCNV fully considers the impact of GC content deviation on CNV detection results,implementing RD corrections of sequencing data to ensure high sensitivity and accuracy of the final detection outputs.Further,EPPCNV adopts a highly adaptable data processing method that operates independently of specific CNV detection methods.The final RD-related information generated by EPPCNV can be directly combined with various mainstream CNV detection methods,thereby achieving a significant improvement in the overall performance of the method without changing the judgment of the CNV regions in the original method.Experimental results show that EPPCNV achieves high comprehensive accuracy and can be directly combined with CNV-LOF,HBOS-CNV,and CNVnator methods,significantly improving the computational efficiency of these methods while maintaining high sensitivity and accuracy.For sequencing data with a higher coverage depth and larger data volume,the combination of the CNV detection method and EPPCNV yields even greater improvements in computational efficiency.

关键词

拷贝数变异检测/MapReduce作业/测序数据处理/读段深度/全基因组

Key words

Copy Number Variation(CNV)detection/MapReduce job/sequencing data processing/Read Depth(RD)/whole-genome

分类

信息技术与安全科学

引用本文复制引用

何亨,程凯莉,张葵,成淑君..基于MapReduce的拷贝数变异测序数据并行处理方案[J].计算机工程,2025,51(5):177-187,11.

基金项目

国家自然科学基金(62372343,61602351). (62372343,61602351)

计算机工程

OA北大核心

1000-3428

访问量0
|
下载量0
段落导航相关论文