首页|期刊导航|计算机工程与科学|面向Web论坛的网络信息获取技术及系统实现

面向Web论坛的网络信息获取技术及系统实现

彭冬蔡皖东

计算机工程与科学2011，Vol.33Issue(1)：157-160,4.

计算机工程与科学2011，Vol.33Issue(1)：157-160,4.DOI:10.3969/j.issn.1007-130X.2011.01.030

面向Web论坛的网络信息获取技术及系统实现

The Web Forum Crawling Technology and System Implementation

彭冬 ¹蔡皖东¹

作者信息

1. 西北工业大学计算机学院,陕西,西安,710072
折叠

摘要

Abstract

The Web spider is very important in gathering information, which also faces new challenges when it's been used in crawling the Web forum.This paper mainly studies the basic technologies of crawling in the Web forum, designs and implements such a system, which is mainly used to gather the information of the Web forum.According to the information structure, a traversal strategy is proposed.Based on the distribution of the context, a DOM and block algorithm is proposed.The experimental result shows that the traversal strategy is more efficient than the traditional traverses to get those highly subject-relevant Web pages, and after using the strategy for the context extracting of Web pages, effectively improves the accuracy of the information collection.

关键词

网络爬虫/Web论坛/正文提取/主题相关度

Key words

web spider/ web forum/ context extracting/ subject relevant

分类

信息技术与安全科学

引用本文复制引用

彭冬,蔡皖东..面向Web论坛的网络信息获取技术及系统实现[J].计算机工程与科学,2011,33(1):157-160,4.

基金项目

国家863计划资助项目(2009AA01Z424) （2009AA01Z424）

2009届西北工业大学本科毕业设计重点扶持项目（）

计算机工程与科学

OA北大核心CSCDCSTPCD

ISSN：1007-130X

访问量0

下载量0

段落导航