计算机应用研究2017,Vol.34Issue(2):369-372,377,5.DOI:10.3969/j.issn.1001-3695.2017.02.011
基于词条属性聚类的文本特征选择算法
Algorithm of text feature selection based on vocabulary attribute clustering
摘要
Abstract
Effective text feature selection is the precondition of text mining.Conventional text feature selection method has limited effect on dimension of eigenvector reduction and text representation.Besides,conventional text feature selection method is not suitable for unsupervised text clustering.In view of above,this paper proposed a novel algorithm of text feature selection based on the concept of vocabulary attribute suitable for text clustering.Firstly,the algorithm constructed the model based on vocabulary attribute including term frequency,document frequency,term position and term correlation.Then it analyzed the approach to calculate attribute value in detail and improved Apriori algorithm to calculate attribute value of term correlation.Finally it clustered on the vocabulary attribute model by the improved K-means clustering algorithm to complete the text feature selection.Experimental results show that this proposed scheme can effectively reduce the dimension of eigenvector and improve the text representation capability of feature vocabulary compared to the traditional methods,and meets the actual demand for text clustering.关键词
文本特征选择/词条属性/词位置/词间关联性/关联规则算法/K-均值算法Key words
text feature selection/vocabulary attribute/term position/term correlation/Apriori algorithm/K-means clustering algorithm分类
信息技术与安全科学引用本文复制引用
张群,王红军,王伦文..基于词条属性聚类的文本特征选择算法[J].计算机应用研究,2017,34(2):369-372,377,5.基金项目
国家自然科学基金资助项目(61273302) (61273302)