|国家科技期刊平台
首页|期刊导航|计算机与现代化|多尺度特征融合的版面分析方法

多尺度特征融合的版面分析方法OACSTPCD

Layout Analysis Method of Multi-scale Feature Fusion

中文摘要英文摘要

针对当前文档版面元素分析中存在的列表和文本错分,表格内小尺度文本难以识别以及空间特征保留效果差等问题,本文基于自底向上的思想,提出一种基于SegNet网络的多特征融合版面分析方法.本文方法在SegNet中引入MSCAN-SE模块,针对表格中的小尺度元素识别率低的问题,利用注意力机制MSCAN-SE中的条状特征来提升模型多尺度特征的提取能力,使得网络能够保留更多尺度的特征信息;针对列表元素和文本元素特征过于相似的问题,通过注意力机制MSCAN-SE中的空洞卷积以及通道注意力分支来扩大网络在特征提取过程的感受野.本文方法与经典的语义分割网络通过实验进行性能比较,结果表明:本文方法在版面分析的测试集上的像素准确率为97.9%,平均交并比为91.7%,平均交并比较U-Net语义分割模型、FCN语义分割模型、DeepLabV3+语义分割模型和SegNet语义分割模型分别提高了7.6%、2.4%、2.6%和1.5%.

Aiming at the problems of list and text misclassification,the difficulty of recognizing small-scale text in tables,and the poor preservation of spatial features in the current document layout element analysis,according to bottom-up thinking,the paper proposes a multi-feature fusion layout analysis method based on SegNet network.In this paper,the MSCAN-SE module is introduced into SegNet to solve the problem of low recognition rate of small-scale elements in tables.The strip features in the at-tention mechanism MSCAN-SE are used to improve the extraction ability of multi-scale features of the model,so that the net-work can retain feature information of more scales.Aiming at the problem that the features of list elements and text elements are too similar,the receptive field of the network in the feature extraction process is expanded through the dilated convolution and channel attention branch in the attention mechanism MSCAN-SE.The performance of the proposed method is compared with the classical semantic segmentation network through experiments.The results show that the pixel accuracy of the proposed method on the test set of layout analysis is 97.9%,and the mean intersection over union ratio is 91.7%.Compared with U-Net semantic seg-mentation model,FCN semantic segmentation model,DeepLabV3+semantic segmentation model,and SegNet semantic segmen-tation model,the mean intersection and union ratio is increased by 7.6%,2.4%,2.6%and 1.5%respectively.

乔佳;徐琨;胡佩蓉

长安大学信息工程学院,陕西 西安 710018

计算机与自动化

版面分析多尺度注意力语义分割通道注意力

document layout analysismulti-scale attentionsemantic segmentationchannel attention

《计算机与现代化》 2024 (005)

16-21 / 6

国家自然科学基金资助项目(52172302);国家重点研发计划项目(2019YFB1600103);陕西省重点研发计划项目(2018ZDXM-GY-044)

10.3969/j.issn.1006-2475.2024.05.004

评论