首页|期刊导航|计算机与现代化|多层次视觉增强的多模态命名实体识别方法

多层次视觉增强的多模态命名实体识别方法

文勇军曹国超夏平黄睿轩

计算机与现代化Issue(4)：95-103,9.

计算机与现代化Issue(4)：95-103,9.DOI:10.3969/j.issn.1006-2475.2026.04.013

多层次视觉增强的多模态命名实体识别方法

Hierarchical Visual Augmentation for Multi-modal Named Entity Recognition Method

文勇军 ¹曹国超 ¹夏平 ¹黄睿轩¹

作者信息

1. 长沙理工大学物理与电子科学学院,湖南长沙 410114
折叠

摘要

Abstract

To address the issues of entity misjudgment caused by visual modality noise and semantic deviation caused by multi-modal feature deficiency in multimodal named entity recognition,we propose a Hierarchical Visual-Augmented Multimodal named entity recognition method(HVAM).Specifically,a Residual neural Network(ResNet)and a Mask Region-based Convo-lutional Neural Network(Mask R-CNN)are utilized to extract hierarchical features from multiple visual information sources.Second,cross-modal fusion is implemented between different levels of image features and textual semantic features,achieving text-enhanced visual features.Furthermore,a multi-level visual prefix generation module is designed and combined with a visual-semantic-enhanced text encoder to realize the visual enhancement of textual features.Finally,a multi-task label decod-ing module is employed for named entity recognition using the visually enhanced text semantic representations.Experimental re-sults on two public datasets,Twitter-2015 and Twitter-2017,demonstrate that compared with 10 existing methods including HvpNet and MAF,the proposed model achieves average F1-scores of 75.63%and 87.27%respectively,outperforming current mainstream baseline models.

关键词

多模态命名实体识别/特征提取/多模态融合/多任务学习/多任务标签解码

Key words

multimodal named entity recognition/feature extraction/multimodal fusion/multi-task learning/multi-task label decoding

分类

信息技术与安全科学

引用本文复制引用

文勇军,曹国超,夏平,黄睿轩..多层次视觉增强的多模态命名实体识别方法[J].计算机与现代化,2026,(4):95-103,9.

计算机与现代化

ISSN：1006-2475

访问量0

下载量0

段落导航