现代电子技术2025,Vol.48Issue(7):65-71,7.DOI:10.16652/j.issn.1004-373x.2025.07.010
基于特征对齐融合的双波段图像描述生成方法
Dual-band image captioning generation method based on feature alignment fusion
摘要
Abstract
It has become a constant matter to detect complex scenes by infrared and visible light synchronous imaging and obtain more accurate and comprehensive on-site information.However,the existing research on image captioning still focuses on visible light images,and fails to describe the detected on-site information comprehensively and accurately.To this end,a visible-infrared dual-band image captioning generation method based on feature alignment fusion is proposed.Firstly,Faster-RCNN is used to extract the regional features of the visible image and the grid features of the infrared image,respectively.Secondly,on the basis of the Transformer,the position information is introduced into the visible-infrared image alignment fusion(VIIAF)encoder as a bridging to align and fuse the features of visible-infrared images.Then,the visual information obtained from fusion is input into the traditional Transformer decoder to get the hidden state of the coarse-grained text.Finally,the visual information output from the encoder,the hidden state obtained from the decoder,and the linguistic information output from the trained Bert are inputted into the designed adaptive module,so that the visual and linguistic information can be involved in the text prediction and achieve the change from the coarse-grained text image captioning to the fine-grained text image captioning.Multiple sets of experiments on the visible-infrared image captioning dataset show that the proposed method can accurately capture the complementary information between visible light images and infrared images.In addition,its performance is improved by 1.9%,2.1%,2.0%,1.8%,1.3%,1.4%and 4.4%on BLEU-1,BLEU-2,BLEU-3,BLEU-4,METROR,ROUGE and CIDEr,respectively,in comparison with the optimal model using Transformer.To sum up,the proposed method is of effectiveness.关键词
图像描述/双波段/特征对齐融合/注意力机制/Transformer/语言模型/Bert/自适应Key words
image captioning/dual-band/feature alignment fusion/attention mechanism/Transformer/language model/Bert/adaption分类
电子信息工程引用本文复制引用
顾梦瑶,蔺素珍,晋赞霞,李烽源..基于特征对齐融合的双波段图像描述生成方法[J].现代电子技术,2025,48(7):65-71,7.基金项目
山西省自然科学基金项目(202303021211147) (202303021211147)
山西省知识产权局专利转化专项计划(202302001) (202302001)
国家自然科学基金项目(62406296) (62406296)
山西省留学回国人员科技活动择优资助项目(20230017) (20230017)