计算机技术与发展2018,Vol.28Issue(6):147-150,155,5.DOI:10.3969/j.issn.1673-629X.2018.06.033
基于正则抽取的竹种数据结构化方法研究
Research on a Data Structuralization Method of Bamboo Species Based on Regular Extraction Model
摘要
Abstract
This study aims to provide a effective and feasible method for efficiently constructing the Bamboo species database by automati-cally extracting and structurally storing the morphological data of Bamboo germplasm resources ( Bamboo species) through the informa-tion extraction technology. To develop the Bamboo regular extraction model,the Bamboo species structure is used as extraction template, database properties as regulation triggers and then the extraction regulation is constructed by regular expression. The experimental objec-tive is set as the flora of Chinese online edition,and then the Bamboo species data is structurally extracted by two steps including web crawler and text extraction. Over five hundred of Bamboo species information is extracted. The accuracy rate of effective field information for extracted Bamboo species is more than 89%. The suggested method is achieved by developing the Bamboo species data extraction sys-tem using Java language. On the basis of regular expression,it is a feasible and effective data structuring method.关键词
信息抽取/正则表达式/竹种数据/数据结构化Key words
information extraction/regular expression/bamboo species data/data structuring分类
信息技术与安全科学引用本文复制引用
李欣,李绍稳,许高建,林建彬..基于正则抽取的竹种数据结构化方法研究[J].计算机技术与发展,2018,28(6):147-150,155,5.基金项目
"十二五"农村领域国家科技计划课题(2015BAD04B03) (2015BAD04B03)