摘要
Abstract
To address the limitations of existing video super-resolution methods in complex motion scenes-including inaccurate frame-to-frame alignment,insufficient utilization of temporal information,and high computational complexity of traditional attention mechanisms,this paper proposes an optical flow-guided cross-attention video super-resolution network(OFCA-Transformer).First,a lightweight multi-scale optical flow estimation module is designed to generate multi-granularity motion information.Second,we innovatively introduce a flow-guided cross-attention mechanism.By establishing local attention windows centered on flow-predicted positions,we achieve an explicit fusion of geometric priors with implicit content awareness.This approach significantly enhances alignment accuracy while substantially reducing computational complexity.Additionally,we construct a hierarchical feature aggregation module to enable more efficient spatio-temporal feature fusion within the Transformer architecture.Our method was evaluated against other approaches on three public datasets at magnification factors of×2,×3,and×4.The results demonstrate that OFCA-Transformer achieves PSNR values only 0.16 dB lower than the state-of-the-art methods across multiple datasets,while reducing model parameters by 82.8%,effectively improving computational efficiency.Furthermore,the proposed method exhibits more precise detail recovery and better temporal consistency in complex motion scenes,objectively achieving superior quantitative metrics across all magnification factors.关键词
视频超分辨率/Transformer/光流估计/交叉注意力/运动对齐Key words
video super-resolution/Tranformer/optical flow estimation/cross-attention/feature fusion分类
信息技术与安全科学