现代电子技术2024,Vol.47Issue(9):86-90,5.DOI:10.16652/j.issn.1004-373x.2024.09.016
基于资源感知的分布式爬虫任务调度方法
Method of distributed crawler task scheduling based on resource awareness
摘要
Abstract
This paper aims to develop a distributed crawler task scheduling method based on resource awareness,so as to optimize the system resource utilization of each node in a distributed environment and improve the execution efficiency of crawler task.By introducing resource awareness scheduling algorithm and node priority management,the monitoring of resources of CPU,memory and network in nodes is achieved to balance the scheduling of crawler task,that is,to ensure that crawler tasks are executed on nodes with low resource utilization,so as to effectively relieve the excessive resource occupation and imbalance among nodes.In addition,the introduction of Flask has improved the scalability of the method and achieved a visual crawler monitoring platform.Experimental results show that the proposed method can achieve significant results in improving the efficiency and adaptability of crawler task execution,which provides useful guidance for the further optimization of distributed crawler systems.关键词
分布式爬虫/任务调度/资源感知/Flask/数据采集/资源利用率Key words
distributed crawler/task scheduling/resource awareness/Flask/data collection/resource utilization rate分类
信息技术与安全科学引用本文复制引用
张军,魏继桢,李钰彬..基于资源感知的分布式爬虫任务调度方法[J].现代电子技术,2024,47(9):86-90,5.基金项目
国家自然科学基金资助项目(62162002) (62162002)
国家自然科学基金资助项目(61662002) (61662002)
江西省自然科学基金资助项目(20212BAB202002) (20212BAB202002)