首页|期刊导航|计算机工程|基于众包的社交网络数据采集模型设计与实现

基于众包的社交网络数据采集模型设计与实现

高梦超胡庆宝程耀东周旭李海波杜然

计算机工程Issue(4)：36-40,5.

计算机工程Issue(4)：36-40,5.DOI:10.3969/j.issn.1000-3428.2015.04.007

基于众包的社交网络数据采集模型设计与实现

Design and Implementation of Crowdsourcing-based Social Network Data Collection Model

高梦超 ¹胡庆宝 ²程耀东 ³周旭 ³李海波 ⁴杜然³

作者信息

1. 四川大学计算机学院，成都610065
2. 中国科学院高能物理研究所计算中心，北京100049
3. 中国科学院高能物理研究所计算中心，北京100049
4. 中国科学院声学研究所，北京100190
折叠

摘要

Abstract

Social network data has the features of informative and strong topicality with significant value for data mining,and it is also a very important part of the Internet big data. However,traditional search engines can not use the keywords retrieve technology to index the information of social network platform directly,and under such circumstances, this paper designs and implements a data collection model based on crowdsourcing mode and C/S architecture. The model consists of four modules including server,client,storage sub-system and a Deep Web crawler system. The nodes run the topic Deep Web crawler system to request new tasks automatically and upload the acquired data,meanwhile the system uses the Hadoop Distributed File System( HDFS) to process data rapidly and store results. The topic Deep Web crawler system has the features of easy configuration,flexible scalability and direct data collection,and it also proves that data collection model is able to fulfill the tasks in a high success rate and collect data in an efficient way.

关键词

社交网络/众包模式/分布式计算/信息采集/Web爬虫/Hadoop分布式文件系统

Key words

social network/crowdsourcing mode/distributed computing/information collection/Web crawler/Hadoop Distributed File System( HDFS)

分类

信息技术与安全科学

引用本文复制引用

高梦超,胡庆宝,程耀东,周旭,李海波,杜然..基于众包的社交网络数据采集模型设计与实现[J].计算机工程,2015,(4):36-40,5.

基金项目

国家“863”计划基金资助项目“基于媒体大数据的大众信息消费服务平台及应用示范”(SS2014AA012305)。（SS2014AA012305）

计算机工程

OA北大核心CSCDCSTPCD

ISSN：1000-3428

访问量0

下载量0

段落导航