| 注册
首页|期刊导航|交通信息与安全|基于规则的海事自由文本信息抽取方法研究

基于规则的海事自由文本信息抽取方法研究

余晨 毛喆 高嵩

交通信息与安全2017,Vol.35Issue(2):40-47,8.
交通信息与安全2017,Vol.35Issue(2):40-47,8.DOI:10.3963/j.issn.1674-4861.2017.02.007

基于规则的海事自由文本信息抽取方法研究

An Approach of Extracting Information for Maritime Unstructured Text Based on Rules

余晨 1毛喆 2高嵩1

作者信息

  • 1. 武汉理工大学智能交通系统研究中心 武汉 430063
  • 2. 武汉理工大学国家水运安全工程技术研究中心 武汉 430063
  • 折叠

摘要

Abstract

Structural processing of maritime data plays an important role in maritime safety.There is a plenty of maritime related information on internet.However, most of the information is unstructured data which has different formats.An approach of extracting maritime information and converting unstructured text into structural data is proposed in this paper.Web crawlers are used to obtain the text data from maritime-related Web pages.According to the definitions of the texts, they are divided into four items, which are time, location, vessel name, and type of accident.According to the extraction process and its common trigger words, the maritime lexicon for segmentation of Chinese words and part-of-speech tagging is constructed.Relying on an analysis of a large number of accident corpuses, the rules for extraction of information are summarized.The structured maritime data is then formulated.In order to verify the feasibility of this approach in term of extracting information based on rules, the data from the website of The Yangtze river maritime bureau is applied as a case study.The results indicate that the precision of extracting time information is 100%, with the recall rate of 91%.The precision of extracting location information is 94.52%, with the recall rate of 69%.The precision of extracting vessel name information is 97.75%, with the recall rate of 86%.The precision of extracting accident type information is 96.6%, with the recall rate of 87%.

关键词

信息抽取/海事自由文本/自定义词库/抽取规则

Key words

extracting information/maritime text information/user-defined words library/rules for extraction

分类

交通工程

引用本文复制引用

余晨,毛喆,高嵩..基于规则的海事自由文本信息抽取方法研究[J].交通信息与安全,2017,35(2):40-47,8.

基金项目

交通运输部建设科技项目(批准号:2015328811180)、工信部高技术船舶项目(船舶综合安全评估及安全水平法应用研究)资助 (批准号:2015328811180)

交通信息与安全

OA北大核心CSCDCSTPCD

1674-4861

访问量0
|
下载量0
段落导航相关论文