计算机技术与发展2017,Vol.27Issue(9):191-196,6.DOI:10.3969/j.issn.1673-629X.2017.09.042
基于Java的新浪微博爬虫研究与实现
Research and Realization of Weibo Crawler with Java
摘要
Abstract
In order to obtain more microblog data efficiently,a Java-based acquisition system of Sina is designed and developed for Wei-bo API,traditional crawler and Web version ( com version) ,by which Weibo. cn Web site crawler system has been established through the breadth combination of traverse combination to collect web page source code and thus the page source code is more concise and purer, reducing network transmission pressure and the HTML source code analysis time. It mainly realizes the Weibo simulated logging,Weibo web crawling,Weibo page data extraction and task scheduling control,and analyzes the crawling data. The theme Weibo selection is add-ed in the crawler. To verify its effectiveness and feasibility,the analysis and comparison is made with other traditional methods. The ex-perimental results show that it is of higher efficiency with simpler code.关键词
新浪微博/网络爬虫/Java/数据挖掘Key words
Sina Weibo/Web crawler/Java/data mining分类
信息技术与安全科学引用本文复制引用
陈珂,蓝鼎栋,柯文德,黎树俊,邓文天..基于Java的新浪微博爬虫研究与实现[J].计算机技术与发展,2017,27(9):191-196,6.基金项目
国家级大学生创新创业训练计划项目( 201411656017, 201611656002, 201611656029, 2016pyA033 ) ( 201411656017, 201611656002, 201611656029, 2016pyA033 )
广东省自然科学基金(2016A030307049) (2016A030307049)
广东省高等学校学科与专业建设专项资金科研类项目(2013KJCX0132) (2013KJCX0132)
广东省云机器人(石油化工)工程技术研究中心开放基金项目(650007) (石油化工)