| 注册
首页|期刊导航|计算机技术与发展|基于正则抽取的竹种数据结构化方法研究

基于正则抽取的竹种数据结构化方法研究

李欣 李绍稳 许高建 林建彬

计算机技术与发展2018,Vol.28Issue(6):147-150,155,5.
计算机技术与发展2018,Vol.28Issue(6):147-150,155,5.DOI:10.3969/j.issn.1673-629X.2018.06.033

基于正则抽取的竹种数据结构化方法研究

Research on a Data Structuralization Method of Bamboo Species Based on Regular Extraction Model

李欣 1李绍稳 1许高建 1林建彬1

作者信息

  • 1. 安徽农业大学 信息与计算机学院,安徽 合肥 230036
  • 折叠

摘要

Abstract

This study aims to provide a effective and feasible method for efficiently constructing the Bamboo species database by automati-cally extracting and structurally storing the morphological data of Bamboo germplasm resources ( Bamboo species) through the informa-tion extraction technology. To develop the Bamboo regular extraction model,the Bamboo species structure is used as extraction template, database properties as regulation triggers and then the extraction regulation is constructed by regular expression. The experimental objec-tive is set as the flora of Chinese online edition,and then the Bamboo species data is structurally extracted by two steps including web crawler and text extraction. Over five hundred of Bamboo species information is extracted. The accuracy rate of effective field information for extracted Bamboo species is more than 89%. The suggested method is achieved by developing the Bamboo species data extraction sys-tem using Java language. On the basis of regular expression,it is a feasible and effective data structuring method.

关键词

信息抽取/正则表达式/竹种数据/数据结构化

Key words

information extraction/regular expression/bamboo species data/data structuring

分类

信息技术与安全科学

引用本文复制引用

李欣,李绍稳,许高建,林建彬..基于正则抽取的竹种数据结构化方法研究[J].计算机技术与发展,2018,28(6):147-150,155,5.

基金项目

"十二五"农村领域国家科技计划课题(2015BAD04B03) (2015BAD04B03)

计算机技术与发展

OACSTPCD

1673-629X

访问量3
|
下载量0
段落导航相关论文