计算机技术与发展2017,Vol.27Issue(12):103-107,114,6.DOI:10.3969/j.issn.1673-629X.2017.12.023
基于网站访问行为的匿名爬虫检测
Anonymous Crawler Detection Based on Web Access
摘要
Abstract
Abstarct:By analysis and study of web crawler accessing web page,some detection algorithms of malicious web crawler are summarizedbased on robot exclusion protocol and crawling,aiming to the problem that it is difficult to identify website accessing from malicious webcrawler disguised as a browser,and that web log detection tools don't support anonymous web crawler detection. In consideration of abovealgorithms,a new one to identify the camouflage web crawler is proposed based on crawling. It detects the web crawler mainly accordingto the length of access time and access cycle of website accessing form both human and crawler,and is verified by an experiment,the data of which is from a server web log. The experimental data are processed by Python for anonymous crawler detection. Comparedwith mainstream detection algorithm of anonymous web crawler,the proposed algorithm can detect the small amount of concurrent anonymousweb crawler.关键词
网络爬虫/网络机器人排斥协议/网站访问行为/匿名爬虫检测Key words
web crawler/robot exclusion protocol/website access/camouflage crawler detection分类
信息技术与安全科学引用本文复制引用
邹建鑫,李红灵..基于网站访问行为的匿名爬虫检测[J].计算机技术与发展,2017,27(12):103-107,114,6.基金项目
国家自然科学基金资助项目(61562090) (61562090)