首页|期刊导航|福建师范大学学报(自然科学版)|基于多头注意力的场景文本图像超分辨率网络

基于多头注意力的场景文本图像超分辨率网络OA北大核心

Scene Text Image Super-Resolution Network Based on Multi-head Attention

中文摘要英文摘要

场景文本图像超分辨率(STISR)技术旨在提高低分辨率文本图像中的分辨率和可读性,是下游文本识别任务的基础性工作.利用深度卷积神经网络完成STISR的现有方法缺乏对文本图像全局信息的考虑,导致恢复结果不稳定,在处理视觉相似的低分辨率文本图像时尤其明显.针对上述问题,提出一种新的场景文本图像超分辨率网络(MASRN),它包含一个文本先验(TP)模块和一个混合骨干网络.TP模块首先通过提取文本图像的语义特征来生成文本先验信息,接着由卷积模块和多头注意力融合模块组成的混合骨干网络将文本先验信息与多尺度图像特征融合.在TextZoom数据集上的实验结果表明,所提的MASRN能够恢复出更高质量的文本图像,有效提升了下游文本识别任务的精度.

Scene text image super-resolution(STISR)aims to enhance the resolution and readability of low-resolution text images,which serves as a foundational step for downstream text recognition tasks.Existing STISR methods based on deep convolutional neural networks often lack consideration of the global information of text images,leading to unstable restoration results,espe-cially for visually similar low-resolution text images.To address this problem,a novel STISR net-work is proposed,which includes a text prior(TP)module and a hybrid backbone network.The TP module generates text prior information by extracting semantic features from text images,while the hybrid backbone network fuses the text prior information with multi-scale image features,consis-ting of convolutional modules and multi-head attention fusion modules.Experimental results on the TextZoom dataset show that our proposed MASRN can restore higher quality text images and effec-tively improve the recognition accuracy in downstream text recognition tasks.

朱佳楠;邢树礼

福建理工大学福建省大数据挖掘与应用重点实验室,福建 福州 350118福建理工大学福建省大数据挖掘与应用重点实验室,福建 福州 350118

计算机与自动化

场景文本图像超分辨率文本识别文本先验卷积网络多头注意力

scene textimage super-resolutiontext recognitiontext priorconvolutional neural networksmulti-head attention

《福建师范大学学报(自然科学版)》 2025 (1)

21-29,9

国家重点研发计划项目(2019YFD0900905)国家自然科学基金项目(61773415)

10.12046/j.issn.1000-5277.2023110029

评论