计算机与现代化Issue(4):95-103,9.DOI:10.3969/j.issn.1006-2475.2026.04.013
多层次视觉增强的多模态命名实体识别方法
Hierarchical Visual Augmentation for Multi-modal Named Entity Recognition Method
文勇军 1曹国超 1夏平 1黄睿轩1
作者信息
- 1. 长沙理工大学物理与电子科学学院,湖南 长沙 410114
- 折叠
摘要
Abstract
To address the issues of entity misjudgment caused by visual modality noise and semantic deviation caused by multi-modal feature deficiency in multimodal named entity recognition,we propose a Hierarchical Visual-Augmented Multimodal named entity recognition method(HVAM).Specifically,a Residual neural Network(ResNet)and a Mask Region-based Convo-lutional Neural Network(Mask R-CNN)are utilized to extract hierarchical features from multiple visual information sources.Second,cross-modal fusion is implemented between different levels of image features and textual semantic features,achieving text-enhanced visual features.Furthermore,a multi-level visual prefix generation module is designed and combined with a visual-semantic-enhanced text encoder to realize the visual enhancement of textual features.Finally,a multi-task label decod-ing module is employed for named entity recognition using the visually enhanced text semantic representations.Experimental re-sults on two public datasets,Twitter-2015 and Twitter-2017,demonstrate that compared with 10 existing methods including HvpNet and MAF,the proposed model achieves average F1-scores of 75.63%and 87.27%respectively,outperforming current mainstream baseline models.关键词
多模态命名实体识别/特征提取/多模态融合/多任务学习/多任务标签解码Key words
multimodal named entity recognition/feature extraction/multimodal fusion/multi-task learning/multi-task label decoding分类
信息技术与安全科学引用本文复制引用
文勇军,曹国超,夏平,黄睿轩..多层次视觉增强的多模态命名实体识别方法[J].计算机与现代化,2026,(4):95-103,9.