首页|期刊导航|现代电子技术|基于特征对齐融合的双波段图像描述生成方法

基于特征对齐融合的双波段图像描述生成方法

顾梦瑶蔺素珍晋赞霞李烽源

现代电子技术2025，Vol.48Issue(7)：65-71,7.

现代电子技术2025，Vol.48Issue(7)：65-71,7.DOI:10.16652/j.issn.1004-373x.2025.07.010

基于特征对齐融合的双波段图像描述生成方法

Dual-band image captioning generation method based on feature alignment fusion

顾梦瑶 ¹蔺素珍 ¹晋赞霞 ¹李烽源¹

作者信息

1. 中北大学计算机科学与技术学院,山西太原 030051
折叠

摘要

Abstract

It has become a constant matter to detect complex scenes by infrared and visible light synchronous imaging and obtain more accurate and comprehensive on-site information.However,the existing research on image captioning still focuses on visible light images,and fails to describe the detected on-site information comprehensively and accurately.To this end,a visible-infrared dual-band image captioning generation method based on feature alignment fusion is proposed.Firstly,Faster-RCNN is used to extract the regional features of the visible image and the grid features of the infrared image,respectively.Secondly,on the basis of the Transformer,the position information is introduced into the visible-infrared image alignment fusion(VIIAF)encoder as a bridging to align and fuse the features of visible-infrared images.Then,the visual information obtained from fusion is input into the traditional Transformer decoder to get the hidden state of the coarse-grained text.Finally,the visual information output from the encoder,the hidden state obtained from the decoder,and the linguistic information output from the trained Bert are inputted into the designed adaptive module,so that the visual and linguistic information can be involved in the text prediction and achieve the change from the coarse-grained text image captioning to the fine-grained text image captioning.Multiple sets of experiments on the visible-infrared image captioning dataset show that the proposed method can accurately capture the complementary information between visible light images and infrared images.In addition,its performance is improved by 1.9%,2.1%,2.0%,1.8%,1.3%,1.4%and 4.4%on BLEU-1,BLEU-2,BLEU-3,BLEU-4,METROR,ROUGE and CIDEr,respectively,in comparison with the optimal model using Transformer.To sum up,the proposed method is of effectiveness.

关键词

图像描述/双波段/特征对齐融合/注意力机制/Transformer/语言模型/Bert/自适应

Key words

image captioning/dual-band/feature alignment fusion/attention mechanism/Transformer/language model/Bert/adaption

分类

电子信息工程

引用本文复制引用

顾梦瑶,蔺素珍,晋赞霞,李烽源..基于特征对齐融合的双波段图像描述生成方法[J].现代电子技术,2025,48(7):65-71,7.

基金项目

山西省自然科学基金项目(202303021211147) （202303021211147）

山西省知识产权局专利转化专项计划(202302001) （202302001）

国家自然科学基金项目(62406296) （62406296）

山西省留学回国人员科技活动择优资助项目(20230017) （20230017）

现代电子技术

OA北大核心

ISSN：1004-373X

访问量0

下载量0

段落导航