| 注册
首页|期刊导航|计算机工程与应用|结合语义改进的K-means短文本聚类算法

结合语义改进的K-means短文本聚类算法

邱云飞 赵彬 林明明 王伟

计算机工程与应用2016,Vol.52Issue(19):78-83,6.
计算机工程与应用2016,Vol.52Issue(19):78-83,6.DOI:10.3778/j.issn.1002-8331.1412-0418

结合语义改进的K-means短文本聚类算法

Improved K-means clustering algorithm combined semantic similarity of short text

邱云飞 1赵彬 1林明明 1王伟1

作者信息

  • 1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 折叠

摘要

Abstract

Nowadays, there are three major challenges for short text clustering, the sparsity of feature key, the complexity of processing in high-dimensional space and the comprehensibility of clusters. For these challenges, a K-means clustering algorithm is proposed, which is improved by combining with semantic. Short text is described by collection of words in this algorithm, it alleviates the sparsity problem of characteristics of short text keywords. The clustering center can be obtained by mining the maximum frequent word set of short text collection, which effectively overcomes the defect that K-means clustering algorithm is sensitive to the initial clustering center, it solves the problem of the comprehensibility of clusters, and avoids the operation in high-dimensional space. The experimental results show that short text clustering algo-rithm combined with semantic is better than traditional algorithms.

关键词

文本挖掘/短文本聚类/K-means算法/最大频繁词集/知网/语义相似度

Key words

text mining/clustering of short text/K-means algorithm/maximum frequent word set/HowNet/semantic similarity

分类

信息技术与安全科学

引用本文复制引用

邱云飞,赵彬,林明明,王伟..结合语义改进的K-means短文本聚类算法[J].计算机工程与应用,2016,52(19):78-83,6.

基金项目

国家自然科学基金(No.71371091);辽宁省高等学校杰出青年学者成长计划(No.LJQ2012027);辽宁省教育厅一般项目(No.L2013131)。 ()

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文