| 注册
首页|期刊导航|计算机工程与应用|基于最大频繁项集挖掘的微博炒作群体发现方法

基于最大频繁项集挖掘的微博炒作群体发现方法

刘琰 张进 陈静 尹美娟 张伟丽

计算机工程与应用2017,Vol.53Issue(4):90-97,8.
计算机工程与应用2017,Vol.53Issue(4):90-97,8.DOI:10.3778/j.issn.1002-8331.1507-0176

基于最大频繁项集挖掘的微博炒作群体发现方法

Detection of hype groups based on mining maximum frequent itemsets in Microblogs

刘琰 1张进 1陈静 1尹美娟 1张伟丽1

作者信息

  • 1. 数学工程与先进计算国家重点实验室,郑州 450002
  • 折叠

摘要

Abstract

In recent years, the hype accounts in Microblogs rise as a new force, using illegal means to carry out the network public relations activities, which has seriously disturbed the normal order of the Internet. The traditional detection of hype accounts mainly uses methods based on feature analysis, ignoring that hype accounts are strongly organizational and plan-ning, which is difficult to find the concealed ones. In view of the above problems, fully considering the group characteris-tics that hype accounts often participate in hype microblogs together, the problem of hype groups detection is transformed into the problem of mining maximum frequent itemsets, and a method based on mining maximum frequent itemsets for the detection of hype groups is proposed, which can find accounts groups who have participated in hype microblogs together in many times. According to the research background and the characteristics of transaction database, a new algorithm based on iterative intersection is proposed to improve the efficiency of mining maximum frequent itemsets, which uses a selection strategy based on binary search algorithm to reduce the transaction database, and uses a variety of ways to reduce the times of intersection between transactions. Finally, the performance of IIA algorithm is evaluated by experiments, and experiments are conducted on a real dataset from Sina Weibo, the experiments results show that this method can find highly concealed hype accounts that can't be identified by traditional methods based on feature analysis, with the accuracy rate of up to 90%.

关键词

数据挖掘/微博/炒作群体/最大频繁项集

Key words

data mining/microblog/hype groups/maximum frequent itemsets

分类

信息技术与安全科学

引用本文复制引用

刘琰,张进,陈静,尹美娟,张伟丽..基于最大频繁项集挖掘的微博炒作群体发现方法[J].计算机工程与应用,2017,53(4):90-97,8.

基金项目

国家自然科学基金(No.61309007) (No.61309007)

国家高技术研究发展计划(863)(No.2012AA012902). (863)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文