首页|期刊导航|微型电脑应用|Web信息抽取系统的设计

Web信息抽取系统的设计

刘斌张晓婧

微型电脑应用2013，Vol.29Issue(3)：8-10,3.

Web信息抽取系统的设计

Design of Web Information Extraction System

刘斌 ¹张晓婧¹

作者信息

1. 陕西科技大学电气与信息工程学院,西安,710021
折叠

摘要

Abstract

In order to obtain the scattered information hidden in Web pages,Web information extraction system design.The system first uses a modified HITS algorithm for topic selection information collection; then the Web page's HTML document structure of the data pre-processing; Finally,based on the XPath DOM tree generation algorithm to obtain the absolute path is an XPath node marked expression,and use the XPath language with XSLT technology to write extraction rules,resulting in a structured database or XML file,to achieve the positioning and Web information extraction.Extraction through a shopping site experiments show that the extraction system works well,can achieve similar batch extract Web page.

关键词

Web信息抽取/主题精选/DOM树/XPath/XSLT

Key words

Web Information Extraction/ Topic Selection/ DOM Tree/ XPath/ XSLT

分类

信息技术与安全科学

引用本文复制引用

刘斌,张晓婧..Web信息抽取系统的设计[J].微型电脑应用,2013,29(3):8-10,3.

基金项目

2012年咸阳市科学技术研究发展计划项目(2012k03-05) （2012k03-05）

微型电脑应用

OACSTPCD

ISSN：1007-757X

访问量0

下载量0

段落导航