自动化学报2024,Vol.50Issue(6):1234-1245,12.DOI:10.16383/j.aas.c230573
多尺度视觉语义增强的多模态命名实体识别方法
Multi-scale Visual Semantic Enhancement for Multimodal Named Entity Recognition Method
摘要
Abstract
To address the issues of semantic loss in image features and weak semantic constraints in multimodal representations encountered in the research of multimodal named entity recognition(MNER)methods,multi-scale visual semantic enhancement for multimodal named entity recognition method(MSVSE)is proposed.After supple-menting image semantics by extracting multiple visual features,the semantic interaction and feature fusion between text features and various visual features are explored through a multimodal feature fusion module.This process out-puts multi-scale visual semantic-enhanced multimodal text representations.The visual entity classifier is used to de-code multi-scale visual semantic features to learn the semantic consistency between various visual features.The multi-task decoder is invoked to mine the fine-grained semantic representation in multimodal text repre-sentation and text features,and carry out joint decoding to solve the semantic bias problem,thereby further improving the accuracy of named entity recognition.To verify the effectiveness of the method,experiments were carried out on Twitter-2015 and Twitter-2017 respectively,and compared with other 10 methods.The average F1 values of the MSVSE on the two datasets have increased.关键词
多模态命名实体识别/多任务学习/多模态融合/TransformerKey words
Multimodal named entity recognition(MNER)/multi-task learning/multimodal fusion/Transformer引用本文复制引用
王海荣,徐玺,王彤,陈芳萍..多尺度视觉语义增强的多模态命名实体识别方法[J].自动化学报,2024,50(6):1234-1245,12.基金项目
宁夏自然科学基金(2023AAC03316),宁夏回族自治区教育厅高等学校科学研究重点项目(NYG2022051)资助 Supported by Natural Science Foundation of Ningxia(2023 AAC03316)and Key Research Project of Education Department of Ningxia Hui Autonomous Region(NYG2022051) (2023AAC03316)