| 注册
首页|期刊导航|南京师大学报(自然科学版)|基于词类和搭配的微博舆情文本聚类方法研究

基于词类和搭配的微博舆情文本聚类方法研究

王恒静 曹存根 高尚

南京师大学报(自然科学版)Issue(1):57-65,9.
南京师大学报(自然科学版)Issue(1):57-65,9.

基于词类和搭配的微博舆情文本聚类方法研究

Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method

王恒静 1曹存根 2高尚1

作者信息

  • 1. 江苏科技大学计算机科学与工程学院,江苏 镇江212003
  • 2. 中国科学院计算技术研究所智能信息处理重点实验室,北京100190
  • 折叠

摘要

Abstract

Micro-blog is the new internet information exchange platform emerged recently,which has the features of theme dispersion,short volume,stylistic freedom,and it can have a huge impact on society. So the information supervision de-partment and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information. This paper presents a novel collocation-based method for text clustering. This method conducts micro-blog text prepro-cessing firstly,and then uses word sense clustering model to extract effective collocation automatically,and effective collo-cation-based text clustering finally. Experiments proved that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%,and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. The result shows the validity of our method.

关键词

微博舆情分析/词义类簇/搭配/相似度/文本聚类

Key words

micro-blog public opinion analysis/word sense cluster/collocation/similarity/text clustering

分类

信息技术与安全科学

引用本文复制引用

王恒静,曹存根,高尚..基于词类和搭配的微博舆情文本聚类方法研究[J].南京师大学报(自然科学版),2015,(1):57-65,9.

基金项目

人工智能四川省重点实验室开放基金(2012RYJ04)、中科院智能信息处理重点实验室开放课题(IIP2013-1) (2012RYJ04)

南京师大学报(自然科学版)

OA北大核心CSCDCSTPCD

1001-4616

访问量0
|
下载量0
段落导航相关论文