实验技术与管理2025,Vol.42Issue(4):68-77,10.DOI:10.16791/j.cnki.sjg.2025.04.009
基于双路径多尺度特征融合的4mC位点预测方法
4mC site prediction approach based on dual-path multiscale feature fusion
摘要
Abstract
[Objective]DNAN4-methylcytosine(4mC)modification plays a crucial role in various cellular processes,including DNA replication,cell cycle regulation,and gene expression,making it an essential epigenetic marker.Understanding and accurately identifying 4mC sites is important for uncovering the mechanisms behind epigenetic regulation in disease and other biological functions.However,traditional 4mC site prediction technologies often suffer from high costs and time inefficiencies,limiting their scalability for large-scale applications.Although several intelligent computing-based 4mC predictors have been proposed over the past decade,their performance remains unsatisfactory.Therefore,developing effective methods to fully utilize the complex interactions within DNA sequences has become a major challenge for improving prediction capabilities.[Methods]A multilevel feature extraction module is introduced,utilizing convolutional layers,bidirectional long short-term memory networks,and an attention mechanism as core components.This setup captures long-term dependencies within DNA sequences,ensuring accurate 4mC site detection.In addition,a multiscale feature extraction module,centered on an improved SENet network,extracts multiscale expressions of location features,improving the model's ability to represent complex sequence characteristics.To further improve feature capture,a parallel feature fusion-based optimization method is proposed.Finally,to address strong imbalances in the number of candidates across different species,the class weights in the cross-entropy loss function are designed to balance the training process.[Results]A deep learning-based dual-path multiscale feature fusion approach is proposed in this work for 4mC site prediction.To validate the structural design of the model,ablation variants were performed with variants,including the SCGF-4mC,SMFI-4mC,and DCMF-4mC models.These experiments demonstrated the structural superiority of the proposed framework.In addition,the model was compared with several advanced 4mC site prediction methods currently available.Results indicate that the proposed 4mC site predictor achieved higher accuracy and stronger generalization ability.Model feature analysis experiments were also conducted using feature matrices generated by four encoding methods as inputs.Comparative evaluations using MCC and ACC metrics on an independent test set confirmed the model's stability and reliability.Meanwhile,spatial distribution calculations of 4mC and non-4mC samples across different species provided compelling evidence of the model's ability to effectively learn and recognize 4mC loci.In summary,the proposed deep learning-based method demonstrated greater accuracy and stronger generalization ability in predicting 4mC sites across six species.[Conclusions]The proposed method demonstrates the capability to identify 4mC sites in a multispecies environment,enhancing predictive performance and offering valuable support for identifying 4mC sites in DNA sequences.关键词
4mC位点预测/多尺度特征融合/双向长短期记忆网络/SENet网络Key words
4mC site prediction/multiscale feature fusion/bidirectional long short-term memory network/SENet network分类
信息技术与安全科学引用本文复制引用
黄泽霞,李煨,邵春莉,耿林..基于双路径多尺度特征融合的4mC位点预测方法[J].实验技术与管理,2025,42(4):68-77,10.基金项目
教育部产学合作协同育人项目(230700005272541,231103177230726) (230700005272541,231103177230726)
安徽大学线上线下混合式课程项目(2023xjzlgc124) (2023xjzlgc124)