计算机应用与软件Issue(2):38-41,4.DOI:10.3969/j.issn.1000-386x.2016.02.009
基于 KNN +层次 SVM 的文本自动分类技术
INTEGRATING KNN AND HIERARCHICAL SVM FOR AUTOMATIC TEXT CLASSIFICATION
王金华 1喻辉 2产文 3周向东 3施伯乐3
作者信息
- 1. 中国电子科技集团公司第三十二研究所 上海 200233
- 2. 成都军区通信网络技术管理中心 四川 成都 610000
- 3. 复旦大学计算机学院 上海 200433
- 折叠
摘要
Abstract
For automatic hierarchical classification of large-scale text,k-nearest neighbours (KNN)algorithm has higher classification efficiency but is not effective for classifying the samples on the borders of categories.The support vector machine (SVM)classification algorithms have higher accuracy,however a number of previous multi-class SVMalgorithms are composed of a number of independent binary classifiers,thus they become slower in training process and are not suitable for hierarchical category structures.This paper presents a new method which integrates both KNN and hierarchical SVM algorithm for automatic text classification.First we modify the KNN algorithm to quickly obtain K class labels of the nearest neighbours,and effectively sift out candidate categories of the documents with them.Then we use a multi-class sparse hierarchical SVMclassifier with uniform learning to make top-down categories partition on the sample,so that implement the efficient and accurate classification process on the documents.Experimental results demonstrate that the classification accuracy of this method on classification dataset with single-layer and multi-layer is better than just using either of the methods,meanwhile it is also close to the fastest single classifier in classification time.关键词
自动文本分类/KNN/层次 SVMKey words
Automatic text classification/K-nearest neighbour/Hierarchical support vector machine分类
信息技术与安全科学引用本文复制引用
王金华,喻辉,产文,周向东,施伯乐..基于 KNN +层次 SVM 的文本自动分类技术[J].计算机应用与软件,2016,(2):38-41,4.