| 注册
首页|期刊导航|计算机技术与发展|基于链接回溯的地理信息更新主题爬虫研究

基于链接回溯的地理信息更新主题爬虫研究

吴家皋 余浩 张雪英

计算机技术与发展Issue(7):52-55,59,5.
计算机技术与发展Issue(7):52-55,59,5.DOI:10.3969/j.issn.1673-629X.2014.07.013

基于链接回溯的地理信息更新主题爬虫研究

Study of Topic-driven Web Crawler for Geographic Information Updating Based on Link Backtracking

吴家皋 1余浩 2张雪英1

作者信息

  • 1. 南京邮电大学 计算机学院,江苏 南京 210003
  • 2. 江苏省无线传感网高技术研究重点实验室,江苏 南京 210003
  • 折叠

摘要

Abstract

The rise of Internet makes it a new way to search for information about geographic information updating,which has advantages of low cost and strong real-time. In allusion to the insufficiency of current top-driven web crawler,a new web crawler based on link backtracking algorithm is proposed in view of practice. First,it can find out the link paths in a website which most probably lead to topic information by using support vector machine classification;then,backtrack to these links and restart crawling,the theme of every links will be confirmed by using geographic information changing factor knowledge base,as a result,it will optimize crawling path and reduce low efficient crawling process. According to results from experiments,it can find out paths which lead to wanted information and enhance effi-ciency of crawling process,and also has a good possibility to extend to other topic areas.

关键词

主题爬虫/地理信息更新/支持向量机/回溯算法

Key words

topic-driven web crawler/geographic information updating/support vector machine/backtracking algorithm

分类

信息技术与安全科学

引用本文复制引用

吴家皋,余浩,张雪英..基于链接回溯的地理信息更新主题爬虫研究[J].计算机技术与发展,2014,(7):52-55,59,5.

基金项目

国家测绘科技项目 ()

江苏省自然科学基金(BK2012833) (BK2012833)

江苏省高校自然科学基金(12KJB520011) (12KJB520011)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文