计算机工程2025,Vol.51Issue(6):49-56,8.DOI:10.19678/j.issn.1000-3428.0068910
基于交叉模态注意力特征增强的医学视觉问答
Medical Visual Question Answering Based on Cross-Modal Attention Feature Enhancement
摘要
Abstract
Medical Visual Question Answering(Med-VQA)requires an understanding of content related to both medical images and text-based questions.Therefore,designing effective modal representations and cross-modal fusion methods is crucial for performing well in Med-VQA tasks.Currently,Med-VQA methods focus only on the global features of medical images and the distribution of attention within a single modality,ignoring medical information in the local features of images and cross-modal interactions,thereby limiting the understanding of image content.This study proposes the Cross-Modal Attention-Guided Medical VQA(CMAG-MVQA)model.First,based on U-Net encoding,this method effectively enhances the local features of an image.Second,from the perspective of cross-modal collaboration,a selection guided attention method is proposed to introduce interactive information from other modalities.In addition,a self-attention mechanism is used to further enhance the image representation obtained by selective guided attention acquisition.Ablation and comparative experiments on the VQA-RAD medical question-answering dataset show that the proposed method performs well in Med-VQA tasks and improves feature representation performance compared to similar methods.关键词
跨模态交互/注意力机制/医学视觉问答/特征融合/特征增强Key words
cross-modal interaction/attention mechanism/Medical Visual Question Answering(Med-VQA)/feature fusion/feature enhancement分类
信息技术与安全科学引用本文复制引用
刘凯,任洪逸,李蓥,季怡,刘纯平..基于交叉模态注意力特征增强的医学视觉问答[J].计算机工程,2025,51(6):49-56,8.基金项目
国家自然科学基金(62376041) (62376041)
江苏省研究生科研与实践创新计划(SJCX21_1341). (SJCX21_1341)