计算机工程与应用2016,Vol.52Issue(19):78-83,6.DOI:10.3778/j.issn.1002-8331.1412-0418
结合语义改进的K-means短文本聚类算法
Improved K-means clustering algorithm combined semantic similarity of short text
摘要
Abstract
Nowadays, there are three major challenges for short text clustering, the sparsity of feature key, the complexity of processing in high-dimensional space and the comprehensibility of clusters. For these challenges, a K-means clustering algorithm is proposed, which is improved by combining with semantic. Short text is described by collection of words in this algorithm, it alleviates the sparsity problem of characteristics of short text keywords. The clustering center can be obtained by mining the maximum frequent word set of short text collection, which effectively overcomes the defect that K-means clustering algorithm is sensitive to the initial clustering center, it solves the problem of the comprehensibility of clusters, and avoids the operation in high-dimensional space. The experimental results show that short text clustering algo-rithm combined with semantic is better than traditional algorithms.关键词
文本挖掘/短文本聚类/K-means算法/最大频繁词集/知网/语义相似度Key words
text mining/clustering of short text/K-means algorithm/maximum frequent word set/HowNet/semantic similarity分类
信息技术与安全科学引用本文复制引用
邱云飞,赵彬,林明明,王伟..结合语义改进的K-means短文本聚类算法[J].计算机工程与应用,2016,52(19):78-83,6.基金项目
国家自然科学基金(No.71371091);辽宁省高等学校杰出青年学者成长计划(No.LJQ2012027);辽宁省教育厅一般项目(No.L2013131)。 ()