计算机应用研究2018,Vol.35Issue(3):671-674,679,5.DOI:10.3969/j.issn.1001-3695.2018.03.007
基于SOM聚类的微博话题发现
Microblog topics detection based on SOM clustering
摘要
Abstract
With the increase of microblog users,the information of microblog platform is updating frequently.This paper proposed microblog topics detection based on SOM clustering for the features of the microblog text data sparseness,new words and non-standard words.Firstly,it pretreated the short texts from the primitive text corpus,and extracted the features of the short texts by the word vector model which reduced the computational burden caused by the high vector dimension.In order to reduce the large amount of computation just to the high vector dimensions,this paper extracted the short text feature extraction by word vector model.Then,the topic clustering could be achieved by an improved SOM clustering.The algorithm improved the traditional texts clustering shortcoming.And the algorithm could find the topic effectively.Experimental results show that the algorithm's comprehensive index F value is improved obviously than the traditional methods.关键词
话题发现/词向量模型/文本相似度/短文本/SOM聚类Key words
topics detection/word vector model/texts similarity/short texts/SOM clustering分类
信息技术与安全科学引用本文复制引用
宋莉娜,冯旭鹏,刘利军,黄青松..基于SOM聚类的微博话题发现[J].计算机应用研究,2018,35(3):671-674,679,5.基金项目
国家自然科学基金资助项目(81360230,81560296) (81360230,81560296)