首页|期刊导航|自动化学报|多尺度视觉语义增强的多模态命名实体识别方法

多尺度视觉语义增强的多模态命名实体识别方法OA北大核心CSTPCD

Multi-scale Visual Semantic Enhancement for Multimodal Named Entity Recognition Method

中文摘要

英文摘要

为解决多模态命名实体识别(Multimodal named entity recognition,MNER)方法研究中存在的图像特征语义缺失和多模态表示语义约束较弱等问题,提出多尺度视觉语义增强的多模态命名实体识别方法(Multi-scale visual semant-ic enhancement for multimodal named entity recognition method,MSVSE).该方法提取多种视觉特征用于补全图像语义,挖掘文本特征与多种视觉特征间的语义交互关系,生成多尺度视觉语义特征并进行融合,得到多尺度视觉语义增强的多模态文本表示;使用视觉实体分类器对多尺度视觉语义特征解码,实现视觉特征的语义一致性约束;调用多任务标签解码器挖掘多模态文本表示和文本特征的细粒度语义,通过联合解码解决语义偏差问题,从而进一步提高命名实体识别准确度.为验证该方法的有效性,在Twitter-2015和Twitter-2017数据集上进行实验,并与其他10种方法进行对比,该方法的平均F1值得到提升.

To address the issues of semantic loss in image features and weak semantic constraints in multimodal representations encountered in the research of multimodal named entity recognition(MNER)methods,multi-scale visual semantic enhancement for multimodal named entity recognition method(MSVSE)is proposed.After supple-menting image semantics by extracting multiple visual features,the semantic interaction and feature fusion between text features and various visual features are explored through a multimodal feature fusion module.This process out-puts multi-scale visual semantic-enhanced multimodal text representations.The visual entity classifier is used to de-code multi-scale visual semantic features to learn the semantic consistency between various visual features.The multi-task decoder is invoked to mine the fine-grained semantic representation in multimodal text repre-sentation and text features,and carry out joint decoding to solve the semantic bias problem,thereby further improving the accuracy of named entity recognition.To verify the effectiveness of the method,experiments were carried out on Twitter-2015 and Twitter-2017 respectively,and compared with other 10 methods.The average F1 values of the MSVSE on the two datasets have increased.

作者：王海荣;徐玺;王彤;陈芳萍

作者单位：北方民族大学计算机科学与工程学院银川 750021||北方民族大学图像图形智能处理国家民委重点实验室银川 750021北方民族大学计算机科学与工程学院银川 750021

中文关键词：多模态命名实体识别多任务学习多模态融合Transformer

英文关键词：Multimodal named entity recognition(MNER)multi-task learningmultimodal fusionTransformer

刊名：《自动化学报》 2024 (006)

页码/页数：1234-1245 / 12

基金：宁夏自然科学基金(2023AAC03316),宁夏回族自治区教育厅高等学校科学研究重点项目(NYG2022051)资助 Supported by Natural Science Foundation of Ningxia(2023 AAC03316)and Key Research Project of Education Department of Ningxia Hui Autonomous Region(NYG2022051)

DOI：10.16383/j.aas.c230573

多尺度视觉语义增强的多模态命名实体识别方法OA北大核心CSTPCD

Multi-scale Visual Semantic Enhancement for Multimodal Named Entity Recognition Method

评论