| 注册
首页|期刊导航|计算机与数字工程|基于文档顺序与多模态模型的金融票据信息抽取

基于文档顺序与多模态模型的金融票据信息抽取

覃俊 林宇亭 刘晶 叶正 刘洲

计算机与数字工程2024,Vol.52Issue(1):23-27,80,6.
计算机与数字工程2024,Vol.52Issue(1):23-27,80,6.DOI:10.3969/j.issn.1672-9722.2024.01.004

基于文档顺序与多模态模型的金融票据信息抽取

Information Extraction of Financial Instrument Based on Document Order and Multimodal Model

覃俊 1林宇亭 1刘晶 2叶正 3刘洲1

作者信息

  • 1. 中南民族大学计算机科学学院 武汉 430074||湖北省制造企业智能管理工程技术研究中心 武汉 430074
  • 2. 中南民族大学计算机科学学院 武汉 430074||湖北省制造企业智能管理工程技术研究中心 武汉 430074||农业区块链与智能管理湖北省工程研究中心 武汉 430074
  • 3. 中南民族大学计算机科学学院 武汉 430074
  • 折叠

摘要

Abstract

The current methods for extracting information from documents mostly work well for simple documents but are not effective for extracting information from complex financial documents that contain background noise and structural complexity.To ad-dress the problem of matching entity relationships in complex financial documents,a sequential reconstruction method and the Lay-outLMv3-GRU information extraction model are proposed.It creates a complex financial document dataset that incorporates text,layout,and image modalities for information extraction.Using the Layout-Parser tool,it designs a sorting module to arrange text in-formation based on contextual relationships and rearrange words that are far apart spatially but closely related logically.By combin-ing the improved LayoutLMv3 model with the GRU network,it further improves the accuracy of the model.It conducts experiments on the public dataset FUNSD and the self-built complex financial dataset.The results show that our method achieves a 2.37%im-provement in F1 score compared to the LayoutLMv3 model.Particularly on the self-built complex financial dataset,the model achieves an F1 score of 88.36%,demonstrating the superiority of the method in extracting information from complex documents and its general applicability in handling various types of documents.

关键词

金融票据/信息抽取/多模态/LayoutLM3/门控神经网络

Key words

financial instruments/information extraction/multimodal/LayoutLM3/GRU

分类

数理科学

引用本文复制引用

覃俊,林宇亭,刘晶,叶正,刘洲..基于文档顺序与多模态模型的金融票据信息抽取[J].计算机与数字工程,2024,52(1):23-27,80,6.

基金项目

国家民委中青年英才培养计划(编号:MZR20007) (编号:MZR20007)

新疆维吾尔自治区区域协同创新专项(科技援疆计划)(编号:2022E02035) (科技援疆计划)

湖北省中医药管理局中医药科研项目(编号:ZY2023M064)资助. (编号:ZY2023M064)

计算机与数字工程

OACSTPCD

1672-9722

访问量3
|
下载量0
段落导航相关论文