首页|期刊导航|计算机工程|基于DOM树与领域本体的Web抽取方法

基于DOM树与领域本体的Web抽取方法

郭建兵崔志明陈明赵朋朋

计算机工程2012，Vol.38Issue(5)：56-58,3.

基于DOM树与领域本体的Web抽取方法

Web Extraction Method Based on DOM Tree and Domain Ontology

郭建兵 ¹崔志明 ²陈明 ¹赵朋朋¹

作者信息

1. 苏州大学智能信息处理及应用研究所,江苏苏州215006
2. 苏州普达新信息技术有限公司,江苏苏州215021
折叠

摘要

Abstract

To solve the problem of automatic extraction from different DeepWeb result page structures, this paper proposes a method which combines the Web structure and the content of Web pages. This method uses the characteristics of data content and the DOM tree nodes which are marked by the domain ontology library positioning data area. An improved simple tree matching algorithm is used to identify data records. Experimental results show that the F-measure value of this method is 2.93%~6.67% higher than that of traditional methods.

关键词

自动抽取/DOM树/领域本体/数据区域定位/简单树匹配

Key words

automatic extraction/DOM tree/domain ontology/data area positioning/simple tree matching

分类

信息技术与安全科学

引用本文复制引用

郭建兵,崔志明,陈明,赵朋朋..基于DOM树与领域本体的Web抽取方法[J].计算机工程,2012,38(5):56-58,3.

基金项目

国家自然科学基金资助项目(60970015,61003054) （60970015,61003054）

江苏省企业博士创新基金资助项目(BK2009563) （BK2009563）

江苏省高校自然科学研究基金资助项目(10KJB520018) （10KJB520018）

苏州市科技型企业技术创新专项基金资助项目(SG201043) （SG201043）

计算机工程

OACSCDCSTPCD

ISSN：1000-3428

访问量0

下载量0

段落导航