微型电脑应用2013,Vol.29Issue(3):8-10,3.
Web信息抽取系统的设计
Design of Web Information Extraction System
摘要
Abstract
In order to obtain the scattered information hidden in Web pages,Web information extraction system design.The system first uses a modified HITS algorithm for topic selection information collection; then the Web page's HTML document structure of the data pre-processing; Finally,based on the XPath DOM tree generation algorithm to obtain the absolute path is an XPath node marked expression,and use the XPath language with XSLT technology to write extraction rules,resulting in a structured database or XML file,to achieve the positioning and Web information extraction.Extraction through a shopping site experiments show that the extraction system works well,can achieve similar batch extract Web page.关键词
Web信息抽取/主题精选/DOM树/XPath/XSLTKey words
Web Information Extraction/ Topic Selection/ DOM Tree/ XPath/ XSLT分类
信息技术与安全科学引用本文复制引用
刘斌,张晓婧..Web信息抽取系统的设计[J].微型电脑应用,2013,29(3):8-10,3.基金项目
2012年咸阳市科学技术研究发展计划项目(2012k03-05) (2012k03-05)