计算机应用研究2024,Vol.41Issue(6):1679-1685,7.DOI:10.19734/j.issn.1001-3695.2023.09.0439
图文语义增强的多模态命名实体识别方法
Textual-visual semantics-enhanced multimodal named entity recognition method
摘要
Abstract
To address the issues of missing textual-visual semantics and unclear multimodal representation semantics in multi-modal named entity recognition methods,this paper proposed a method of textual-visual semantic enhancement for multimodal named entity recognition.In this method,it used various pre-trained models to extract text features,character features,re-gional visual features,image keywords and visual labels,in order to comprehensively describe the semantic information of image-text data.It adopted the Transformer and cross-modal attention mechanism to mine the complementary semantic relation-ships between image-text features,guiding feature fusion,thereby generating semantically complete text representations and semantically enhanced multimodal representations.By integrating boundary detection,entity type detection,and named entity recognition tasks,it constructed a multi-task label decoder,which could perform fine-grained semantic decoding of input fea-tures,to improve the semantic accuracy of predicted features.It used this decoder to jointly decode text representations and multimodal representations to obtain globally optimal predicted labels.A large number of experimental results on the Twitter-2015 and Twitter-2017 benchmark datasets show that the proposed method has increased the average F,score by 1.00%and 1.41%respectively,which indicates that the model has a strong capability for named entity recognition.关键词
多模态命名实体识别/多模态表示/多模态融合/多任务学习/命名实体识别Key words
multimodal named entity recognition/multimodal representation/multimodal fusion/multi-task learning/named entity recognition分类
信息技术与安全科学引用本文复制引用
徐玺,王海荣,王彤,马赫..图文语义增强的多模态命名实体识别方法[J].计算机应用研究,2024,41(6):1679-1685,7.基金项目
宁夏自然科学基金资助项目(2023AAC03316) (2023AAC03316)
北方民族大学中央高校基本科研业务费专项资金资助项目(2022PT_S04) (2022PT_S04)