| 注册
首页|期刊导航|计算机工程与应用|基于跨模态和循环分解自注意力的场景文本识别

基于跨模态和循环分解自注意力的场景文本识别

徐诗康 刘俊峰 曾君 廖丁丁

计算机工程与应用2025,Vol.61Issue(11):176-184,9.
计算机工程与应用2025,Vol.61Issue(11):176-184,9.DOI:10.3778/j.issn.1002-8331.2403-0361

基于跨模态和循环分解自注意力的场景文本识别

Scene Text Spotting Based on Cross-Modal and Circular Factorized Self-Attention

徐诗康 1刘俊峰 1曾君 2廖丁丁1

作者信息

  • 1. 华南理工大学 自动化科学与工程学院,广州 510641
  • 2. 华南理工大学 电力学院,广州 510641
  • 折叠

摘要

Abstract

Current end-to-end scene text spotting methods usually integrate the two subtasks of text detection and recognition into a unified framework without sufficiently considering the interaction and synergy.Aiming at these issues,an end-to-end scene text spotting method based on cross-modal and circular factorized self-attention is proposed.Firstly,based on the scaled dot-product attention mechanism,a cross-modal module is designed with the aim of enhancing the fusion of visual and semantic information,thus enhancing the interaction between text detection and recognition.Then,a circular factor-ized self-attention with circular convolution is employed instead of the self-attention in the decoder to better capture the contour features of text instances and thus improve the performance of text detection.Finally,extensive experiments on Total-Text,CTW1500,and ICDAR 2015 datasets show that the proposed method has a more significant improvement compared to the current mainstream methods in terms of the accuracy,recall,F value of text detection and the accuracy of end-to-end text spotting.Moreover,the ablation experiments demonstrate the effectiveness of the proposed method.

关键词

场景文本识别/跨模态/循环卷积/分解自注意力/特征融合

Key words

scene text spotting/cross-modal/circular convolution/factorized self-attention/feature fusion

分类

信息技术与安全科学

引用本文复制引用

徐诗康,刘俊峰,曾君,廖丁丁..基于跨模态和循环分解自注意力的场景文本识别[J].计算机工程与应用,2025,61(11):176-184,9.

基金项目

国家自然科学基金(62173148,52377186) (62173148,52377186)

广东省普通高校重点领域专项(新一代信息技术)(2021ZDZX1136). (新一代信息技术)

计算机工程与应用

OA北大核心

1002-8331

访问量9
|
下载量0
段落导航相关论文