电子学报2025,Vol.53Issue(3):951-961,11.DOI:10.12263/DZXB.20240642
面向肺部肿瘤分类的跨模态Light-3Dformer模型
Cross-Modal Light-3Dformer Model for Lung Tumor Classification
摘要
Abstract
Recognition of 3D multimodal positron emission tomography/computed tomography(PET/CT)lung tumor using deep learning is an important research area.In medical images of lung tumors,the spatial shape of lesions is irregular and the boundary between the lesions and the surrounding tissues is blurred,which makes it difficult for the model to fully extract tumor features,and the computational complexity of the model is higher in three-dimensional tasks.To solve the above problems,a cross-modal Light-3Dformer 3D lung tumor recognition model is proposed in this paper.The main contri-butions of this paper are as follows.Firstly,the backbone network extracts PET/CT image features,and the auxiliary net-work extracts PET image features and CT image features.Multi-modal feature enhancement and interactive learning are re-alized by lightweight cross-modal collaborative attention.Secondly,Light-3Dformer module are designed.In this module,Updating the 2 times matrix multiplication operation of Transformer to the linear element multiplication operation of Light-former;The cascade Lightformer structure is designed,the output feature map of the cascade Lightformer structure and the initial input feature map are fused,through parallel and deep and shallow feature fusion,lightweight and rich gradient infor-mation can be realized;Designing with parameter less attention,this structure can enhance the ability of lung tumor feature extraction from three aspects:channel,space,and tomography image.Thirdly,lightweight cross-modal collaborative atten-tion module(LCCAM)is designed,which can fully learn the cross-modal advantage information of 3D multi-modal images and carry out interactive learning of deep and shallow features.Finally,ablation experiments and comparative experiments.In the self-built 3D multi-modal data set of lung tumor,the accuracy and area under the curve(AUC)values of the model are 90.19%and 89.81%,respectively,under the premise of optimal computation and running time.Comparing with the 3D-SwinTransformer-S model,the computation quantity is reduced by 117 times,and the calculation quantity is reduced by 400 times.The experimental results show that the model can better extract multi-modal information of lung tumor lesions,which provides a new idea for lightweight and multi-modal interaction of deep learning 3D models.关键词
肺部肿瘤/多模态图像/Transformer/Light-3Dformer/轻量化跨模态协同注意力Key words
lung tumor/multimodal images/Transformer/Light-3Dformer/light cross-modal collaborative attention分类
计算机与自动化引用本文复制引用
周涛,牛玉霞,叶鑫宇,刘隆,陆惠玲..面向肺部肿瘤分类的跨模态Light-3Dformer模型[J].电子学报,2025,53(3):951-961,11.基金项目
国家自然科学基金(No.62062003) (No.62062003)
宁夏自然科学基金(No.2023AAC03293) National Natural Science Foundation of China(No.62062003) (No.2023AAC03293)
Natural Science Foundation of Ningxia Province(No.2023AAC03293) (No.2023AAC03293)