| 注册
首页|期刊导航|计算机技术与发展|基于深度学习的模板化手写表单信息提取方法

基于深度学习的模板化手写表单信息提取方法

董前前 陈亮 王鑫鑫

计算机技术与发展2024,Vol.34Issue(10):204-212,9.
计算机技术与发展2024,Vol.34Issue(10):204-212,9.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0218

基于深度学习的模板化手写表单信息提取方法

Template-based Handwritten Form Information Extraction Method Using Deep Learning

董前前 1陈亮 1王鑫鑫1

作者信息

  • 1. 西安工程大学 计算机科学学院,陕西 西安 710600
  • 折叠

摘要

Abstract

Handwritten paper forms serve as crucial data carriers for information exchange among various departments in manufacturing enterprises,and the extraction of key information is of great significance for production,management,and decision-making.However,current solutions for handwritten form information extraction face challenges in accurately and rapidly extracting key information from complex text layouts.To address this issue,a two-stage template-based handwritten form information extraction method is proposed,requiring only one image to complete template construction,focusing on user-relevant information,and avoiding potential logical errors in traditional relationship extraction tasks in complex tables.Initially,for a specific type of table image,the desired recognition areas are directly annotated on the image,and corresponding key values are assigned to these areas.Subsequently,a high-resolution network is em-ployed to improve the detection precision of small text,and a strategy of uniform segmentation with shuffling at multiple resolutions is proposed to achieve good performance in both performance and parameters for the detection model.Simultaneously,the introduction of temporal convolutional networks and self-attention mechanisms enables the recognition model to better handle the blurriness,unclearness,and stroke omissions caused by handwriting speed and writing tools.After recognition,the system binds the recognition results with preset key values to form structured output.Experimental results demonstrate that compared to the typical ResNet50 model with almost equal parameters,the precision of small text detection is improved by 15.8 percentage points.In text recognition tasks,the model achieves a character precision of 99.30%on the CASIA-HWDB2.0-2.2 dataset.Even in cases where the text box does not completely cover the entire text line,the character precision only drops by 0.55 percentage points,indicating that the text recognition model exhibits good ro-bustness.

关键词

信息提取/手写表单/基于模板/手写文字识别/文本行检测

Key words

information extraction/handwritten form/template-based/handwritten text recognition/text line detection

分类

信息技术与安全科学

引用本文复制引用

董前前,陈亮,王鑫鑫..基于深度学习的模板化手写表单信息提取方法[J].计算机技术与发展,2024,34(10):204-212,9.

基金项目

陕西省教育厅重点科学研究计划(22JS021) (22JS021)

国家自然科学基金(51675108) (51675108)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文