| 注册
首页|期刊导航|计算机工程与应用|改进的密集视频描述Transformer译码算法

改进的密集视频描述Transformer译码算法

杨大伟 盘晓芳 毛琳 张汝波

计算机工程与应用2024,Vol.60Issue(17):89-97,9.
计算机工程与应用2024,Vol.60Issue(17):89-97,9.DOI:10.3778/j.issn.1002-8331.2306-0024

改进的密集视频描述Transformer译码算法

Improved Transformer Decoding Algorithm for Dense Video Description

杨大伟 1盘晓芳 1毛琳 1张汝波1

作者信息

  • 1. 大连民族大学 机电工程学院,辽宁 大连 116650
  • 折叠

摘要

Abstract

When applying Transformer for dense video description,historical text features can interfere with subsequent text generation,making it difficult to capture dynamic video information and affecting the coherence and accuracy of the descriptions.To maintain context consistency while mitigating historical text noise,this paper proposes an improved Transformer decoding algorithm for dense video description,called D-Uformer.This algorithm utilizes feedforward neural network(FNN)to enhance the representation of historical text features.It constructs pruning branches to remove redun-dant information and compensatory branches to enhance contextual information through skip connections,and uses sub-traction to reduce the impact of inaccurate descriptions caused by over-focusing on historical text features and improves the model's attention to input video features.Additionally,it uses addition to compensate for the loss of contextual infor-mation during feature transfer,and generates accurate and coherent descriptions of the current video content.Experi-mental results on the ActivityNet and Charades datasets demonstrate a significant performance improvement of the D-Uformer algorithm.Compared to the temporally descriptive probabilistic captioning(TDPC)network,it achieves a maxi-mum accuracy improvement of 4.816%and a maximum diversity improvement of 4.167%.The generated descriptions not only align better with the video content but also conform more to human language conventions.

关键词

密集视频描述/Transformer网络/译码/前馈神经网络/跳跃连接

Key words

dense video description/Transformer network/decoding/feedforward neural network/skip connection

分类

信息技术与安全科学

引用本文复制引用

杨大伟,盘晓芳,毛琳,张汝波..改进的密集视频描述Transformer译码算法[J].计算机工程与应用,2024,60(17):89-97,9.

基金项目

国家自然科学基金(61673084) (61673084)

辽宁省自然科学基金(20180550866,2020-MZLH-24). (20180550866,2020-MZLH-24)

计算机工程与应用

OA北大核心CSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文