首页|期刊导航|计算机工程|基于交叉模态注意力特征增强的医学视觉问答

基于交叉模态注意力特征增强的医学视觉问答

刘凯任洪逸李蓥季怡刘纯平

计算机工程2025，Vol.51Issue(6)：49-56,8.

计算机工程2025，Vol.51Issue(6)：49-56,8.DOI:10.19678/j.issn.1000-3428.0068910

基于交叉模态注意力特征增强的医学视觉问答

Medical Visual Question Answering Based on Cross-Modal Attention Feature Enhancement

刘凯 ¹任洪逸 ¹李蓥 ¹季怡 ¹刘纯平¹

作者信息

1. 苏州大学计算机科学与技术学院,江苏苏州 215006
折叠

摘要

Abstract

Medical Visual Question Answering(Med-VQA)requires an understanding of content related to both medical images and text-based questions.Therefore,designing effective modal representations and cross-modal fusion methods is crucial for performing well in Med-VQA tasks.Currently,Med-VQA methods focus only on the global features of medical images and the distribution of attention within a single modality,ignoring medical information in the local features of images and cross-modal interactions,thereby limiting the understanding of image content.This study proposes the Cross-Modal Attention-Guided Medical VQA(CMAG-MVQA)model.First,based on U-Net encoding,this method effectively enhances the local features of an image.Second,from the perspective of cross-modal collaboration,a selection guided attention method is proposed to introduce interactive information from other modalities.In addition,a self-attention mechanism is used to further enhance the image representation obtained by selective guided attention acquisition.Ablation and comparative experiments on the VQA-RAD medical question-answering dataset show that the proposed method performs well in Med-VQA tasks and improves feature representation performance compared to similar methods.

关键词

跨模态交互/注意力机制/医学视觉问答/特征融合/特征增强

Key words

cross-modal interaction/attention mechanism/Medical Visual Question Answering(Med-VQA)/feature fusion/feature enhancement

分类

信息技术与安全科学

引用本文复制引用

刘凯,任洪逸,李蓥,季怡,刘纯平..基于交叉模态注意力特征增强的医学视觉问答[J].计算机工程,2025,51(6):49-56,8.

基金项目

国家自然科学基金(62376041) （62376041）

江苏省研究生科研与实践创新计划(SJCX21_1341). （SJCX21_1341）

计算机工程

OA北大核心

ISSN：1000-3428

访问量1

下载量0

段落导航