计算机技术与发展2017,Vol.27Issue(11):83-87,5.DOI:10.3969/j.issn.1673-629X.2017.11.018
可增量的用户短文本聚类方法研究
Research on Scalable Clustering of User-oriented Short Text
摘要
Abstract
With the advent of big data time,data of user short text has growing explosively. Acquisition of useful information from short text with clustering analysis technology is becoming most important. Clustering analysis,as a crucial means of knowledge discovery,is the process of classifying the objects according to their similarity degree of characteristics. Therefore,a scalable clustering method of user-ori-ented short text is proposed,which is composed of two phases,offline clustering and online clustering. The short text is pre-processed by recognizing and removing irrelevant words with irrelevant words dictionary and normalizing semantics with parts of speech dictionary in offline clustering. A similarity calculation method is proposed based on fusion of mutli-features to conduct correlation clustering on text. Then in the online clustering,the online texts are clustered via taken results of offline clustering as features. Results of clustering are pro-duced by integration of the results from offline clustering with those of online clustering. In order to verify its effectiveness and feasibility, the contrast experiments are conducted. Experimental results show that it has achieved recall rate in clustering by 73%,clustering accuracy by 87. 7% and value of F-measure by 79. 6%,which is superior to feature vector method.关键词
短文本/语义归一化/离线聚类/在线聚类Key words
short text/semantic normalization/offline clustering/online clustering分类
信息技术与安全科学引用本文复制引用
张仪,陈国,张再跃..可增量的用户短文本聚类方法研究[J].计算机技术与发展,2017,27(11):83-87,5.基金项目
国家自然科学基金资助项目(61371114,61170156) (61371114,61170156)
江苏科技大学海洋装备研究院自培育项目(HZ2016004) (HZ2016004)