首页|期刊导航|计算机应用研究|反向聚焦细粒度多模态语义对齐的视频字幕模型

反向聚焦细粒度多模态语义对齐的视频字幕模型

蔡霞罗会兰万斯奇

计算机应用研究2025，Vol.42Issue(7)：1986-1993,8.

计算机应用研究2025，Vol.42Issue(7)：1986-1993,8.DOI:10.19734/j.issn.1001-3695.2024.11.0492

反向聚焦细粒度多模态语义对齐的视频字幕模型

Reverse-focus fine-grained multimodal semantic alignment for video captioning

蔡霞 ¹罗会兰 ¹万斯奇¹

作者信息

1. 江西理工大学信息工程学院,江西赣州 341100
折叠

摘要

Abstract

Existing video captioning often introduce multimodal information to assist models in extracting critical and fine-grained details from complex and dynamic visual content.However,these methods tend to overlook the semantic gaps caused by representational differences among modalities.To bridge these gaps,facilitate effective cross-modal alignment and efficient fu-sion,and enhance the extraction of fine-grained semantic information,this paper proposed a reverse-focus fine-grained multimo-dal semantic alignment for video captioning(RM4Cap).This model combined an image-text pair corpus and facilitated seman-tic alignment between video and image,indirectly aligning video representations with text in the image-text pairs.And it de-signed a reverse attention focusing algorithm to suppress redundant scene information while highlighting inconspicuous objects and their interactions.Experiments conducted on the MSVD and MSRVTT datasets show that the model significantly outper-forms existing methods in metrics such as CIDEr and BLEU-4.It effectively resolves the alignment challenges and redundancy issues in multimodal fusion,further demonstrating its ability to narrow the cross-modal semantic gap.

关键词

视频字幕/多模态/反向注意力/语义对齐/语义鸿沟

Key words

video captioning/multimodal/reverse attention/semantic alignment/semantic gap

分类

信息技术与安全科学

引用本文复制引用

蔡霞,罗会兰,万斯奇..反向聚焦细粒度多模态语义对齐的视频字幕模型[J].计算机应用研究,2025,42(7):1986-1993,8.

基金项目

国家自然科学基金资助项目(62361032) （62361032）

江西省主要学科技术带头人领军人才计划资助项目(20213BCJ22004) （20213BCJ22004）

江西省自然科学基金重点项目(20232ACB202011) （20232ACB202011）

多维智能感知与控制江西省重点实验室资助项目(2024SSY03161) （2024SSY03161）

江西省研究生创新专项资金资助项目(YC2023-S657) （YC2023-S657）

计算机应用研究

OA北大核心

ISSN：1001-3695

访问量0

下载量0

段落导航