计算机工程与科学2018,Vol.40Issue(2):313-319,7.DOI:10.3969/j.issn.1007-130X.2018.02.017
基于词向量语义聚类的微博热点挖掘方法
A Twitter hotspot mining method based on sematic clustering of word vectors
摘要
Abstract
With the rapid development of social media,information overloading becomes a challenge.As a result,how to mining hotspots automatically from so many short and noisy data is an important problem.Social data are real-time and geographic,which usually contain plenty of meta-information.According to these characteristics,this paper proposes a hotspot mining method,which combines user's behavior patterns and text content analysis.In the process of content analysis,we cluster text on the word scale rather than message scale.Besides,sematic clustering technology of word vectors is used for promoting the performance of keywords extraction.Experimental results on real datasets show that this method is better than traditional methods.Specifically,keywords extracted by this method have strong semantic relevance and good topic segmentation,which are superior to the traditional hot-spot mining methods on the main indexes.关键词
热点挖掘/社交媒体/词向量/语义聚类Key words
hotspot mining/Twitter/word embedding/semantic clustering分类
信息技术与安全科学引用本文复制引用
刘培磊,唐晋韬,王挺,谢松县,岳大鹏,刘海池..基于词向量语义聚类的微博热点挖掘方法[J].计算机工程与科学,2018,40(2):313-319,7.基金项目
国家自然科学基金(61532001,61472436) (61532001,61472436)