计算机应用研究2024,Vol.41Issue(4):1252-1257,6.DOI:10.19734/j.issn.1001-3695.2023.07.0344
改进视觉Transformer的视频插帧方法
Video frame interpolation method based on improved visual Transformer
摘要
Abstract
Aiming at the problem that the existing video frame interpolation methods cannot effectively deal with large motion and complex motion scenes,this paper proposed a video frame interpolation method based on improved vision Transformer.This method fused the cross-scale window-based attention and the separable spatio-temporal local attention,enlarged the re-ceptive field of attention,and aggregated multi-scale information.It jointly modeled the spatio-temporal dependencies and long-range pixel dependencies,thereby enhancing the model's ability to handle large motion scenes.The experimental results show that this model achieves PSNR values of 37.13 dB and 28.28 dB on the Vimeo90K test set and the DAVIS dataset,re-spectively,while the SSIM values reach 0.978 and 0.891,respectively.At the same time,visualization results show that the proposed method can produce clear and reasonable frame interpolation results for videos with large motion,complex motion and occlusion scenes.关键词
视频插帧/Transformer/基于跨尺度窗口的注意力/大运动/复杂运动Key words
video frame interpolation/Transformer/cross-scale window-based attention/large motion/complex motion分类
信息技术与安全科学引用本文复制引用
石昌通,单鸿涛,郑光远,张玉金,刘怀远,宗智浩..改进视觉Transformer的视频插帧方法[J].计算机应用研究,2024,41(4):1252-1257,6.基金项目
国家自然科学基金资助项目(62173222) (62173222)