计算机工程与应用2025,Vol.61Issue(11):176-184,9.DOI:10.3778/j.issn.1002-8331.2403-0361
基于跨模态和循环分解自注意力的场景文本识别
Scene Text Spotting Based on Cross-Modal and Circular Factorized Self-Attention
摘要
Abstract
Current end-to-end scene text spotting methods usually integrate the two subtasks of text detection and recognition into a unified framework without sufficiently considering the interaction and synergy.Aiming at these issues,an end-to-end scene text spotting method based on cross-modal and circular factorized self-attention is proposed.Firstly,based on the scaled dot-product attention mechanism,a cross-modal module is designed with the aim of enhancing the fusion of visual and semantic information,thus enhancing the interaction between text detection and recognition.Then,a circular factor-ized self-attention with circular convolution is employed instead of the self-attention in the decoder to better capture the contour features of text instances and thus improve the performance of text detection.Finally,extensive experiments on Total-Text,CTW1500,and ICDAR 2015 datasets show that the proposed method has a more significant improvement compared to the current mainstream methods in terms of the accuracy,recall,F value of text detection and the accuracy of end-to-end text spotting.Moreover,the ablation experiments demonstrate the effectiveness of the proposed method.关键词
场景文本识别/跨模态/循环卷积/分解自注意力/特征融合Key words
scene text spotting/cross-modal/circular convolution/factorized self-attention/feature fusion分类
信息技术与安全科学引用本文复制引用
徐诗康,刘俊峰,曾君,廖丁丁..基于跨模态和循环分解自注意力的场景文本识别[J].计算机工程与应用,2025,61(11):176-184,9.基金项目
国家自然科学基金(62173148,52377186) (62173148,52377186)
广东省普通高校重点领域专项(新一代信息技术)(2021ZDZX1136). (新一代信息技术)