计算机工程与应用2011,Vol.47Issue(2):127-130,4.DOI:10.3778/j.issn.1002-8331.2011.02.040
一种基于中心文档的KNN中文文本分类算法
K-nearest neighbor Chinese text categorization algorithm based on center documents
摘要
Abstract
In order to search or extract information in a special category from large data source,text automatic categorization has become a hot subject of research.KNN is an important method of text automatic categorization, it can deal with large data sets with more stability,but it faces with the problem of slow speed.Based on KNN classification,the semantic relation of feature items is introduced, and clustering to build center documents under it.This method reduces the number of documents which KNN should search,and increases the speed of classification.Simulation results show that the proposed algorithm improves the speed in the case of traditional classification precision.关键词
中文文本分类/七最邻近/中心文档/语义相似度/聚类Key words
Chinese text classification/ k-Nearest Neighbor(KNN)/ center documents/ semantic similarity/ clustering分类
信息技术与安全科学引用本文复制引用
鲁婷,王浩,姚宏亮..一种基于中心文档的KNN中文文本分类算法[J].计算机工程与应用,2011,47(2):127-130,4.基金项目
国家自然科学基金(the National Natural Science Foundation of China under Grant No.60705015) (the National Natural Science Foundation of China under Grant No.60705015)
安徽省自然科学基金(the Natural Science Foundation of Anhui Province of China under Grant No.070412064) (the Natural Science Foundation of Anhui Province of China under Grant No.070412064)
合肥工业大学科学研究发展基金项目(No.070504F). (No.070504F)