燕山大学学报2025,Vol.49Issue(6):496-506,11.DOI:10.3969/j.issn.1007-791X.2025.06.004
双模态注意力机制的图像信息提取与翻译
Image information extraction and translation based on bimodal attention mechanism
摘要
Abstract
The accurate image information extraction and adaptive language conversion methods have strong value in ubiquitous computing,since images serve as the primary information carrier and occupy a large percentage of big data.However,current methods struggle to accurately comprehend complex images,resulting in inadequate translation and conversion accuracy.To address this issue,a bimodal attention mechanism translation network,named EM2 NMT,is proposed in this paper,which include the image modalities and semantic.The multi-target image supervision mechanism aligns corresponding regions of words and images to capture the relationship between source language sentences and multiple image targets,achieving precise comprehension of image targets.In semantic attention extraction,the meaning expressed in the text will be accurately understood by analyzing various semantic components in the sentence of the source language.Firstly,the features in the image are extracted through convolution calculation,and then the image features and source sentences are input together into the Transformer with bimodal attention mechanism,ultimately achieving accurate translation from the source language to the target language.Lots of experiments are conducted on many datasets,and the experimental results show that the BLEU and METEOR reach 45.41 and 60.35,respectively.关键词
语义提取/多模态机器翻译/注意力机制/神经网络Key words
semantic extraction/multimodal machine translation/attention mechanism/neural network分类
信息技术与安全科学引用本文复制引用
白莹琦,李铄,杨清,党明辉,李煜..双模态注意力机制的图像信息提取与翻译[J].燕山大学学报,2025,49(6):496-506,11.基金项目
国家自然科学基金资助项目(62373300) (62373300)
国家社会科学基金资助项目(22BXW039) (22BXW039)
陕西省重点研发计划项目(2024GX-YBXM-149) (2024GX-YBXM-149)
西北大学教育信息化研究项目(2019NWUXXH14) (2019NWUXXH14)