| 注册
首页|期刊导航|电子学报|基于全局与序列混合变分Transformer的多样化图像描述生成方法

基于全局与序列混合变分Transformer的多样化图像描述生成方法

刘兵 李穗 刘明明 刘浩

电子学报2024,Vol.52Issue(4):1305-1314,10.
电子学报2024,Vol.52Issue(4):1305-1314,10.DOI:10.12263/DZXB.20231155

基于全局与序列混合变分Transformer的多样化图像描述生成方法

Diverse Image Captioning Based on Hybrid Global and Sequential Variational Transformer

刘兵 1李穗 1刘明明 2刘浩1

作者信息

  • 1. 中国矿业大学计算机科学与技术学院,江苏徐州 221116||矿山数字化教育部工程研究中心,江苏徐州 221116
  • 2. 中国矿业大学计算机科学与技术学院,江苏徐州 221116
  • 折叠

摘要

Abstract

Diverse image captioning has become a research hotspot in the field of image description.Existing meth-ods generally ignore the dependency relationship between global and sequential latent vectors,which seriously limits the performance improvement.To address this problem,this paper proposes a hybrid variational Transformer based diverse im-age captioning framework.Firstly,we construct a hybrid conditional variational autoencoder to effectively model the depen-dency between global and sequential latent vectors.Secondly,the evidence lower bound is derived by maximizing the condi-tional likelihood of the hybrid autoencoder,which serves as the objective function for diverse image captioning.Finally,we seamlessly combine the Transformer model with the hybrid conditional variational autoencoder,which can be jointly opti-mized to improve the generalization performance of diverse image captioning.The experimental results on MSCOCO datas-et show that compared with the state-of-the-art methods,when randomly generating 20 and 100 captions,the diversity met-ric m-BLEU(Mutual overlap Bilingual Evaluation Under study)has improved by 4.2%and 4.7%,respectively,while the ac-curacy metric CIDEr(Consensus based Image Description Evaluation)has improved by 4.4%and 15.2%,respectively.

关键词

图像理解/图像描述/变分自编码/隐嵌入/多模态学习/生成模型

Key words

image understanding/image captioning/variational autoencoding/latent embedding/multi-modal learn-ing/generative model

分类

信息技术与安全科学

引用本文复制引用

刘兵,李穗,刘明明,刘浩..基于全局与序列混合变分Transformer的多样化图像描述生成方法[J].电子学报,2024,52(4):1305-1314,10.

基金项目

国家自然科学基金(No.62276266,No.61801198) National Natural Science Foundation of China(No.62276266,No.61801198) (No.62276266,No.61801198)

电子学报

OA北大核心CSTPCD

0372-2112

访问量0
|
下载量0
段落导航相关论文