南京师大学报(自然科学版)Issue(1):57-65,9.
基于词类和搭配的微博舆情文本聚类方法研究
Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method
摘要
Abstract
Micro-blog is the new internet information exchange platform emerged recently,which has the features of theme dispersion,short volume,stylistic freedom,and it can have a huge impact on society. So the information supervision de-partment and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information. This paper presents a novel collocation-based method for text clustering. This method conducts micro-blog text prepro-cessing firstly,and then uses word sense clustering model to extract effective collocation automatically,and effective collo-cation-based text clustering finally. Experiments proved that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%,and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. The result shows the validity of our method.关键词
微博舆情分析/词义类簇/搭配/相似度/文本聚类Key words
micro-blog public opinion analysis/word sense cluster/collocation/similarity/text clustering分类
信息技术与安全科学引用本文复制引用
王恒静,曹存根,高尚..基于词类和搭配的微博舆情文本聚类方法研究[J].南京师大学报(自然科学版),2015,(1):57-65,9.基金项目
人工智能四川省重点实验室开放基金(2012RYJ04)、中科院智能信息处理重点实验室开放课题(IIP2013-1) (2012RYJ04)