首页|期刊导航|计算机与现代化|基于分块重要性模型与Xpath的Web信息抽取的研究

基于分块重要性模型与Xpath的Web信息抽取的研究

庞秋奔顾平杨小梅

计算机与现代化Issue(8)：73-75,79,4.

计算机与现代化Issue(8)：73-75,79,4.DOI:10.3969/j.issn.1006-2475.2009.08.020

基于分块重要性模型与Xpath的Web信息抽取的研究

Research on Web Information Extraction Based on Combining Block Importance Model and Xpath

庞秋奔 ¹顾平 ¹杨小梅¹

作者信息

1. 广西大学计算机电子信息学院,广西,南宁,530004
折叠

摘要

Abstract

Approaches of page segment reduce the unit of Web information extraction from page to block. This paper studies the main approaches of page segment and the based-learning block importance model, and analyses the approach of Xpath-based Web information extraction. Combining the advantages of the two approaches, this paper proposes a new Web information extraction based on combining block importance model and Xpath, discusses its design process, and gives its formalized description and experimental result. The result shows that this approach is fit for extracting from the Web which has many records.

关键词

网页分块/块重要性权重/Xpath/Web信息抽取

Key words

page segment/value of block importance/Xpath/Web information extraction

分类

信息技术与安全科学

引用本文复制引用

庞秋奔,顾平,杨小梅..基于分块重要性模型与Xpath的Web信息抽取的研究[J].计算机与现代化,2009,(8):73-75,79,4.

计算机与现代化

OACSTPCD

ISSN：1006-2475

访问量0

下载量0

段落导航