计算机应用研究2011,Vol.28Issue(2):492-494,520,4.DOI:10.3969/j.issn.1001-3695.2011.02.023
网络蜘蛛在网络论坛领域的研究与设计
Study and design on Web spider in Internet forums
滕召生 1胡德敏1
作者信息
- 1. 上海理工大学光电信息与计算机工程学院,上海,200093
- 折叠
摘要
Abstract
To improve the crawling efficiency when Web spider is crawling forums, from the layout and structure of forums,This paper analyzed the universal feature of all forums, and designed a targeting Web spider crawling strategy. The analysis of many forums proved that a majority of information was showed to the users by the pre-designed layout and structure which could be reflected by DOM tree. Through the operation to the tree, URL could be collected, and then the repeated URL be filtrated.Experiment results show that spider crawliag strategy in this paper can increase the efficiency of the crawling of Web spiders and saves substantial network bandwidth and spaces of local-storage.关键词
网络蜘蛛/文档对象模型树/页面重复区域/爬行策略/重复模板分类
信息技术与安全科学引用本文复制引用
滕召生,胡德敏..网络蜘蛛在网络论坛领域的研究与设计[J].计算机应用研究,2011,28(2):492-494,520,4.