土壤2025,Vol.57Issue(2):445-451,7.DOI:10.13758/j.cnki.tr.2025.02.023
基于深度学习的土壤学文献图形数值抽取技术框架初步构建
Preliminary Construction of Technical Framework for Numerical Value Extraction from Figures in Soil Literatures Based on Deep Learning
摘要
Abstract
To address the issue of low efficiency in extracting numerical values from figures,based on deep learning,a technical framework to extract numerical values from figures in soil literatures was proposed.Firstly,the common figure elements and their symbols were sorted out,and some figures were collected and manually labelled to form a training dataset.Secondly,using YOLO v8 base model,which uses the global image to detect multiple targets through one-time process,an optimized model suitable for the detection of figure elements in soil literatures was trained through several rounds of training.Thirdly,to convert the identified figure elements to real values,an algorithm was designed to automatically calculate the numerical values in 2D scatter and histogram figures.Using figures that were not involved in the training,the results showed this technique could effectively extract the figure elements and the numerical values were in high agreement with the manually extracted values(the linear regression coefficient of determination R2>0.99).Therefore,the technical framework proposed in this study has strong feasibility,which provides a new approach for the efficient use of figure data in soil literatures.关键词
深度学习/土壤学文献/图形要素识别/数值提取/YOLO v8Key words
Deep learning/Soil research literatures/Identification of figure elements/Numerical extraction/YOLO v8分类
农业科技引用本文复制引用
刘杰,马海艺,郭志英,郏梦思,王昌昆,潘贤章..基于深度学习的土壤学文献图形数值抽取技术框架初步构建[J].土壤,2025,57(2):445-451,7.基金项目
国家重点研发计划项目(2020YFC1807401)、国家科技基础资源调查专项项目(2021FY100703)和中国科学院网络安全和信息化专项应用示范项目(CAS-WX2022SF-0201)资助. (2020YFC1807401)