首页|期刊导航|计算机技术与发展|一种基于知识工程的DeepWeb信息抽取方法

一种基于知识工程的DeepWeb信息抽取方法

乌尔柯西杨抒王业游香薷

计算机技术与发展2016，Vol.26Issue(9)：183-186,191,5.

计算机技术与发展2016，Vol.26Issue(9)：183-186,191,5.DOI:10.3969/j.issn.1673-629X.2016.09.041

一种基于知识工程的DeepWeb信息抽取方法

A DeepWeb Information Extraction Method Based on Knowledge Engineering

乌尔柯西 ¹杨抒 ¹王业 ¹游香薷¹

作者信息

1. 新疆农业大学计算机与信息工程学院，新疆乌鲁木齐 830052
折叠

摘要

Abstract

The information contained in DeepWeb is more and more huge with great value. But due to the factors that DeepWeb informa-tion is highly heterogeneous, autonomous, dynamic and incomplete, and the design style, page structure, display contents of DeepWeb theme website are different,and the use of JavaScript technology is widespread,the traditional extraction technology can’ t be effectively automated integration of high quality information contained in DeepWeb resources. Presents a DeepWeb extraction method based on knowledge engineering. The page mode,HTML structure and visual features of DeepWeb are analyzed and integrated. Applies HTML DOM tree parsing algorithm to match the template accorded with page mode,HTML structure and object information source by automatic or semi-automatic way,locating the information in DeepWeb to obtain the free text,structured and semi-structured data. Using a large number of site data with nested structure as data source,the effectiveness of extraction method is verified.

关键词

DeepWeb/JavaScript技术/嵌套结构/DOM树/抽取模型

Key words

DeepWeb/JavaScript technology/nested structure/DOM tree/extraction model

分类

信息技术与安全科学

引用本文复制引用

乌尔柯西,杨抒,王业,游香薷..一种基于知识工程的DeepWeb信息抽取方法[J].计算机技术与发展,2016,26(9):183-186,191,5.

基金项目

新疆维吾尔自治区自然科学基金(2014211B023) （2014211B023）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航