| 注册
首页|期刊导航|计算技术与自动化|基于协程模型的分布式爬虫框架

基于协程模型的分布式爬虫框架

杨济运 刘建勋 姜磊 彭桃 文一凭 卢厅

计算技术与自动化Issue(3):126-133,8.
计算技术与自动化Issue(3):126-133,8.

基于协程模型的分布式爬虫框架

A Distributed Crawler Framework Based on Coroutine Model

杨济运 1刘建勋 1姜磊 1彭桃 1文一凭 1卢厅1

作者信息

  • 1. 湖南科技大学 计算机科学与工程学院 知识处理与网络化制造湖南省普通高校重点实验室,湖南 湘潭 411201
  • 折叠

摘要

Abstract

Web crawler is mainly limited by the network latency and local resource.The traditional framework of web crawler,which is based on multi-threads,is mainly to eliminate the network latency but failed to take the local resource limi-tation into account.Under the high concurrent,multi-threads architecture will result in a poor running efficiency because of the increasing of the context switch.So studying on how to make maximum usage of network resources and also considering the local resource limitation becomes a necessary.To solve the above problems,this paper will propose a distributed crawler framework based on coroutine.First we have analyzed the overhead,resource utilization and network utilization between co-routines and threads,and implemented a web crawler based on coroutine.Experiments had shown that our architecture for a distributed web crawler based on coroutine is better than threads-based web crawler.

关键词

协程/分布式/高性能/爬虫

Key words

coroutine/distribution/high-performance/web crawler

分类

信息技术与安全科学

引用本文复制引用

杨济运,刘建勋,姜磊,彭桃,文一凭,卢厅..基于协程模型的分布式爬虫框架[J].计算技术与自动化,2014,(3):126-133,8.

基金项目

国家自然科学基金项目(61272063,61100054) (61272063,61100054)

教育部新世纪优秀人才支持计划项目(NCET-10-0140) (NCET-10-0140)

教育部人文社科基金项目(12YJCZH084) (12YJCZH084)

湖南省教育厅资助项目(12C0119) (12C0119)

湖南省科技计划项目(2013FJ3002) (2013FJ3002)

湖南科技大学资助项目(E51368) (E51368)

计算技术与自动化

OACSTPCD

1003-6199

访问量0
|
下载量0
段落导航相关论文