陆军军医大学学报2026,Vol.48Issue(4):479-492,14.DOI:10.16016/j.2097-0927.202510069
环形特征融合网络(RFFNet):一种融合CNN与Transformer并校正内镜畸变实现炎症性肠病高精度实时诊断的深度学习模型
Ring-Feature Fusion Network(RFFNet):a deep learning model integrating CNN and transformer with endoscopic distortion correction for high-accuracy real-time diagnosis of inflammatory bowel disease
摘要
Abstract
Objective To leverage deep learning technology to assist endoscopists in accurately diagnosing colonoscopy images of Crohn's disease(CD),ulcerative colitis(UC),and Normal tissue.Methods A total of 24 492 colonoscopy images from 1 309 subjects were collected from Department of Gastroenterology,Daping Hospital,Army Medical University,and Sir Run Run Shaw Hospital between January 2018 and November 2020.The dataset included 4 729 CD images from 424 cases,7 074 UC images from 605 cases,and 12 689 Normal images from 280 cases.The images were randomly split by case in a 7∶1∶2 ratio into a training set(915 cases,17 136 images),an internal test set(131 cases,2 626 images),and an external validation set(263 cases,4 730 images).To address feature fusion between CNN and Transformer architectures and correct endoscopic image distortion,we constructed a Ring-Feature Fusion Network(RFFNet)based on ResNeSt50 and MViTv2.The network employs bidirectional channel-spatial attention for cross-stage feature fusion,effectively combining the local feature extraction strengths of CNN with the global modeling capabilities of Transformer.An innovative ring-feature mechanism was introduced to handle barrel distortion and depth-of-field phenomena in colonoscopy images,enhancing the model's adaptability to the geometric properties of the distal intestinal lumen.RFFNet was compared with five existing deep learning models.Recognition results on the external validation set were visualized using confusion matrices,and performance was evaluated using accuracy,sensitivity,specificity,F1-score,and area under the curve(AUC).Ablation studies were conducted on the external validation set using metrics like accuracy and F1-score to validate the effectiveness and necessity of each improvement in RFFNet.Gradient-weighted class activation mapping(Grad-CAM)was used to generate heatmaps,visually demonstrating the model's focus areas and improving interpretability.Results On the external validation set,RFFNet achieved an overall accuracy of 95.68%(95%CI:94.14%to 98.58%)and an overall AUC of 0.987(95%CI:0.985 to 0.990).For CD,UC,and Normal classes,the AUCs were 0.982(95%CI:0.979 to 0.986),0.981(95%CI:0.977 to 0.984),and 0.997(95%CI:0.995 to 0.998),respectively;sensitivities were 93.35%(95%CI:91.67%to 94.71%),94.00%(95%CI:92.60%to 95.16%),and 98.56%(95%CI:97.99%to 98.97%),respectively;specificities were 94.58%(95%CI:93.81%to 95.27%),94.20%(95%CI:93.36%to 94.94%),and 98.61%(95%CI:98.05%to 99.01%),respectively;F1-scores were 87.81%(95%CI:86.28%to 89.25%),90.05%(95%CI:89.00%to 91.25%),and 98.58%(95%CI:98.22%to 98.92%),respectively.These results indicate that RFFNet achieved high diagnostic precision.Compared to using CNN or Transformer models alone,RFFNet's overall accuracy improvement was statistically significant(P<0.05).Ablation studies confirmed that the model,through a dynamic spatial attention mechanism,deeply integrates CNN's local fine-grained feature extraction with Transformer's global contextual modeling advantages.The CNN-Transformer architecture improved overall accuracy by 0.11%compared to using CNN alone,respectively.The ring-feature mechanism corrected endoscopic optical distortion and depth-of-field attenuation,enhancing modeling of the distal lumen geometry,and improved RFFNet's overall accuracy by 0.57%.Grad-CAM heatmaps demonstrated the model's adaptive ability to capture structural features in colonoscopy images.Conclusion RFFNet,through its dual-backbone architecture that deeply couples CNN's local texture perception with Transformer's global dependency modeling,and explicit correction of endoscopic distortion via ring-features,enables high-accuracy real-time classification of CD,UC,and normal mucosa.关键词
炎症性肠病/深度学习/溃疡性结肠炎/克罗恩病Key words
inflammatory bowel disease/deep learning/ulcerative colitis,the Crohn's disease分类
医药卫生引用本文复制引用
李华龙,魏艳玲,阮广聪,孟薇,吴毅,李颖,唐嘉杰,刘静静,粘永健..环形特征融合网络(RFFNet):一种融合CNN与Transformer并校正内镜畸变实现炎症性肠病高精度实时诊断的深度学习模型[J].陆军军医大学学报,2026,48(4):479-492,14.基金项目
重庆市技术创新与应用发展专项重点项目(CSTB2022TIAD-KPX0161) (CSTB2022TIAD-KPX0161)
重庆市自然科学基金创新发展联合基金项目(CSTB2024NSCQ-LZX0141) (CSTB2024NSCQ-LZX0141)
重庆英才·创新创业领军人才项目(CQYC20220303576) Supported by the Chongqing Technology Innovation and Application Development Special Key Project(CSTB2022TIAD-KPX0161),the Chongqing Natural Science Foundation Innovation and Development Joint Fund(CSTB2024NSCQ-LZX0141)and the Chongqing Yingcai·Innovation and Entrepreneurship Leading Talent Project(CQYC20220303576). (CQYC20220303576)