计算机应用研究2016,Vol.33Issue(11):3374-3377,3382,5.DOI:10.3969/j.issn.1001--3695.2016.11.038
基于聚类改进的 KN N文本分类算法
Improved KNN text classification algorithm based on clustering
摘要
Abstract
The traditional KNN text classification algorithm is a classification method which is an unsupervised,no parame-ters,simply,more popular and it’s easily to achieve.But it need to constantly calculate the similarity between the test and sample text sets,when larger amounts of the text,the efficiency will be much more worse.To improve the classification effi-ciency of the traditional KNN algorithm,this paper proposed an improved KNN algorithm based on the clustering.Before this algorithm,it used an improved χ2 statistics way to extract the feature of texts,then making the text sets into several clusters based on clustering method,at last it used the improved KNN way to classify the texts.The experiment and analysis results show that this algorithm can better deal with the text classification.关键词
文本分类/KNN/聚类化/训练集Key words
text classification/KNN/clustering/training set分类
信息技术与安全科学引用本文复制引用
周庆平,谭长庚,王宏君,湛淼湘..基于聚类改进的 KN N文本分类算法[J].计算机应用研究,2016,33(11):3374-3377,3382,5.基金项目
国家自然科学基金资助项目(61379057,61309001,61379110,61103202,61301136);国家教育部博士点基金优先发展领域课题 ()