自动化学报2017,Vol.43Issue(3):448-461,14.DOI:10.16383/j.aas.2017.c160308
基于双层采样主动学习的社交网络虚假用户检测方法
Two-layer Sampling Active Learning Algorithm for Social Spammer Detection
摘要
Abstract
With the rapid development of social network, more and more people join in social network to make friends and share their views. However, social network is always suffering from fake accounts due to its openness. Fake accounts, also called spammers, always spread spam information to achieve their own purpose, which have destroyed the security and reliability of social network. Existing detection methods extract behaviour, text and relationship features of users, and then use machine learning algorithms to identify social spammers. But machine learning algorithms often suffer from insufficiently labeled training data. Aiming to solve this problem, we propose an efficient algorithm, called two-layer sampling active learning, to construct an accurate classifier with minimum labeled samples. We present three criteria (uncertainty, representative and diversity) to quantity the value of unlabeled samples, using the combination of sorting and clustering to actively select samples with max uncertainty, max representative and max diversity. Experimental results on Twitter, Apontador, and Youtube datasets prove the efficiency of our approach, and better precision and recall of our approach than other active learning methods.关键词
社交网络/虚假用户/主动学习/样本多样性Key words
Social network/spammer/active learning/diversity of samples引用本文复制引用
谭侃,高旻,李文涛,田仁丽,文俊浩,熊庆宇..基于双层采样主动学习的社交网络虚假用户检测方法[J].自动化学报,2017,43(3):448-461,14.基金项目
国家重点基础研究发展计划(973计划)(2013CB328903),重庆市基础与前沿研究计划(cstc2015jcyjA40049),国家自然科学基金(71102065),国家科技支撑计划(2015BAF05B03),中央高校基础研究基金(106112014CDJZR095502)资助Supported by National Key Basic Research Program of China(973Program)(2013CB328903),Basic and advanced research projects in Chongqing(cstc2015jcyjA40049),National Natu-ral Science Foundation of China(71102065),National Science and Technology Ministry(2015BAF05B03),and Fundamental Research Funds for the Central Universities(106112014CD-JZR095502) (973计划)