中南大学学报(自然科学版)2017,Vol.48Issue(7):1782-1789,8.DOI:10.11817/j.issn.1672-7207.2017.07.014
一种基于特征库投影的文本分类算法
A text classification algorithm based on feature library projection
摘要
Abstract
Considering that KNN algorithm has some disadvantages such as high time complexity,feature reduction,sample clipping and information loss,a feature library projection (FLP) classification algorithm was proposed.Firstly,the algorithm reserved all the features and characteristics of the training sample weight in the feature library.The data in this library were changed into new projection samples through the projection functions.By calculating the similarity of the new sample with the projection samples,data classification could be achieved.Based on the text classification,the effectiveness of the algorithm and texts,the data were validated under two conditions,i.e.small training texts and large training texts,and it was compared with KNN algorithm.The results show that the FLP algorithm does not lose the classification feature,and the classification accuracy is higher than that of other ones.The classification efficiency is not directly related to the sample size growth,and the time complexity is low.关键词
文本分类/KNN算法/特征库投影Key words
text classification/KNN algorithm/feature library projection分类
信息技术与安全科学引用本文复制引用
尹绍锋,郑蕙,徐少华,荣辉桂,张娜..一种基于特征库投影的文本分类算法[J].中南大学学报(自然科学版),2017,48(7):1782-1789,8.基金项目
国家自然科学基金资助项目(61672221,61304184,61672156) (Projects(61672221,61304184,61672156) supported by the National Natural Science Foundation of China) (61672221,61304184,61672156)