| 注册
首页|期刊导航|计算机工程|结合语义与统计的特征降维短文本聚类

结合语义与统计的特征降维短文本聚类

杨婉霞 孙理和 黄永峰

计算机工程2012,Vol.38Issue(22):171-175,5.
计算机工程2012,Vol.38Issue(22):171-175,5.

结合语义与统计的特征降维短文本聚类

Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics

杨婉霞 1孙理和 2黄永峰3

作者信息

  • 1. 甘肃农业大学工学院,兰州730070
  • 2. 清华大学电子工程系,北京100084
  • 3. 西北师范大学外国语学院,兰州730070
  • 折叠

摘要

Abstract

The primary difficulty of text clustering lies in the multi-dimensional sparseness of texts. A short text clustering algorithm which takes semantic and statistic features into account is proposed. A dimensionality reduction is achieved via the semantic relativity analysis of lexical semantics by semantic dictionary. The second dimension reduction is completed after a feature selection through statistical methods. The short text clustering is obtained with the combination of the two reductions. Experimental result shows that the algorithm has better clustering effect and efficiency on short text.

关键词

特征选择/聚类/短文本/向量空间模型/语义/降维

Key words

feature selection/ clustering/ short text/ Vector Space Model(VSM)/ semantic/ dimension reduction

分类

信息技术与安全科学

引用本文复制引用

杨婉霞,孙理和,黄永峰..结合语义与统计的特征降维短文本聚类[J].计算机工程,2012,38(22):171-175,5.

基金项目

国家"863"计划基金资助项目(2011AA010704,2012AA011004) (2011AA010704,2012AA011004)

清华大学自主科研基金资助项目"跨媒体分布式垂直搜索及舆情分析的关键技术"(20111081023) (20111081023)

计算机工程

OACSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文