| 注册
首页|期刊导航|计算机应用研究|基于词条属性聚类的文本特征选择算法

基于词条属性聚类的文本特征选择算法

张群 王红军 王伦文

计算机应用研究2017,Vol.34Issue(2):369-372,377,5.
计算机应用研究2017,Vol.34Issue(2):369-372,377,5.DOI:10.3969/j.issn.1001-3695.2017.02.011

基于词条属性聚类的文本特征选择算法

Algorithm of text feature selection based on vocabulary attribute clustering

张群 1王红军 1王伦文1

作者信息

  • 1. 电子工程学院,合肥230037
  • 折叠

摘要

Abstract

Effective text feature selection is the precondition of text mining.Conventional text feature selection method has limited effect on dimension of eigenvector reduction and text representation.Besides,conventional text feature selection method is not suitable for unsupervised text clustering.In view of above,this paper proposed a novel algorithm of text feature selection based on the concept of vocabulary attribute suitable for text clustering.Firstly,the algorithm constructed the model based on vocabulary attribute including term frequency,document frequency,term position and term correlation.Then it analyzed the approach to calculate attribute value in detail and improved Apriori algorithm to calculate attribute value of term correlation.Finally it clustered on the vocabulary attribute model by the improved K-means clustering algorithm to complete the text feature selection.Experimental results show that this proposed scheme can effectively reduce the dimension of eigenvector and improve the text representation capability of feature vocabulary compared to the traditional methods,and meets the actual demand for text clustering.

关键词

文本特征选择/词条属性/词位置/词间关联性/关联规则算法/K-均值算法

Key words

text feature selection/vocabulary attribute/term position/term correlation/Apriori algorithm/K-means clustering algorithm

分类

信息技术与安全科学

引用本文复制引用

张群,王红军,王伦文..基于词条属性聚类的文本特征选择算法[J].计算机应用研究,2017,34(2):369-372,377,5.

基金项目

国家自然科学基金资助项目(61273302) (61273302)

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文