计算机技术与发展Issue(7):52-55,59,5.DOI:10.3969/j.issn.1673-629X.2014.07.013
基于链接回溯的地理信息更新主题爬虫研究
Study of Topic-driven Web Crawler for Geographic Information Updating Based on Link Backtracking
摘要
Abstract
The rise of Internet makes it a new way to search for information about geographic information updating,which has advantages of low cost and strong real-time. In allusion to the insufficiency of current top-driven web crawler,a new web crawler based on link backtracking algorithm is proposed in view of practice. First,it can find out the link paths in a website which most probably lead to topic information by using support vector machine classification;then,backtrack to these links and restart crawling,the theme of every links will be confirmed by using geographic information changing factor knowledge base,as a result,it will optimize crawling path and reduce low efficient crawling process. According to results from experiments,it can find out paths which lead to wanted information and enhance effi-ciency of crawling process,and also has a good possibility to extend to other topic areas.关键词
主题爬虫/地理信息更新/支持向量机/回溯算法Key words
topic-driven web crawler/geographic information updating/support vector machine/backtracking algorithm分类
信息技术与安全科学引用本文复制引用
吴家皋,余浩,张雪英..基于链接回溯的地理信息更新主题爬虫研究[J].计算机技术与发展,2014,(7):52-55,59,5.基金项目
国家测绘科技项目 ()
江苏省自然科学基金(BK2012833) (BK2012833)
江苏省高校自然科学基金(12KJB520011) (12KJB520011)