首页|期刊导航|计算机工程|微博数据通用抓取算法

微博数据通用抓取算法

卢体广刘新刘任任

计算机工程Issue(5)：12-16,20,6.

计算机工程Issue(5)：12-16,20,6.DOI:10.3969/j.issn.1000-3428.2014.05.003

微博数据通用抓取算法

Universal Crawling Algorithm for Microblogging Data

卢体广 ¹刘新 ¹刘任任¹

作者信息

1. 湘潭大学信息工程学院智能计算与信息处理教育部重点实验室，湖南湘潭 411105
折叠

摘要

Abstract

Currently, Web crawler and microblog API which are used to grab data from the microblog are difficult to satisfy the public opinion system demands for microblog data. To settle the problem, this paper presents a feasible solution which is the similar as the browser login microblog to capture data from Web pages. It can easily get all data from any microblog users. On this basis, it constructs a microblogging network through interconnections among users, and discovers new users through it. In order to get high quality data, it builds mathematical models to calculate the user’s influence index by using posting number, posting frequency, fans number, forwarding number and comments number. Moreover, it builds priority queue according to the calculated influence factor, which let those that have bigger influence index have high acquisition frequency. Finally, it calculates time interval to balance the lower frequency of non-active microblog user. The experimental results show that this method not only processes easily and has higher speed but also can obtain high quality information and have huge versatility.

关键词

微博数据/模拟登录/用户网络/用户影响力/网络舆情/优先队列

Key words

microblogging data/analog login/user network/user influence/Internet public opinion/priority queue

分类

信息技术与安全科学

引用本文复制引用

卢体广,刘新,刘任任..微博数据通用抓取算法[J].计算机工程,2014,(5):12-16,20,6.

基金项目

湖南省自然科学基金资助项目(12JJ3066)；湖南省高校科技成果产业化培育基金资助项目(11CY018)；湖南省重点学科基金资助项目。（12JJ3066）

计算机工程

OA北大核心CSCDCSTPCD

ISSN：1000-3428

访问量0

下载量0

段落导航