| 注册
首页|期刊导航|计算机应用研究|网络蜘蛛在网络论坛领域的研究与设计

网络蜘蛛在网络论坛领域的研究与设计

滕召生 胡德敏

计算机应用研究2011,Vol.28Issue(2):492-494,520,4.
计算机应用研究2011,Vol.28Issue(2):492-494,520,4.DOI:10.3969/j.issn.1001-3695.2011.02.023

网络蜘蛛在网络论坛领域的研究与设计

Study and design on Web spider in Internet forums

滕召生 1胡德敏1

作者信息

  • 1. 上海理工大学光电信息与计算机工程学院,上海,200093
  • 折叠

摘要

Abstract

To improve the crawling efficiency when Web spider is crawling forums, from the layout and structure of forums,This paper analyzed the universal feature of all forums, and designed a targeting Web spider crawling strategy. The analysis of many forums proved that a majority of information was showed to the users by the pre-designed layout and structure which could be reflected by DOM tree. Through the operation to the tree, URL could be collected, and then the repeated URL be filtrated.Experiment results show that spider crawliag strategy in this paper can increase the efficiency of the crawling of Web spiders and saves substantial network bandwidth and spaces of local-storage.

关键词

网络蜘蛛/文档对象模型树/页面重复区域/爬行策略/重复模板

分类

信息技术与安全科学

引用本文复制引用

滕召生,胡德敏..网络蜘蛛在网络论坛领域的研究与设计[J].计算机应用研究,2011,28(2):492-494,520,4.

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文