计算机工程与应用2019,Vol.55Issue(22):40-45,6.DOI:10.3778/j.issn.1002-8331.1809-0275
基于冗余度的KNN训练样本裁剪新算法
New Redundancy-Based Algorithm for Reducing Amount of Training Examples in KNN
摘要
Abstract
As one of the top 10 algorithms in data mining, the K-Nearest-Neighbor(KNN)algorithm is widely used because it is an non-parametric, simple and effective algorithm without training time. However, when it faces to massive amount of high-dimensional training examples, its high classification time complexity becomes a bottleneck of its application. In addition, its classification performance is often harmed, when the class distribution of training examples is skewed and the class imbalance problem occurs. To address these two issues, this paper proposes a new redundancy-based algorithm for reducing the amount of training examples(simply RBKNN). RBKNN at first computes the redundancy of each training example, and then randomly deletes some high redundant training examples by introducing a pre-processing process. RBKNN can not only reduce the size of training example set, but also make the class distribution of training examples more balanced. The experimental results show that RBKNN significantly promotes the efficiency of KNN, yet at the same time maintains or improves the classification accuracy of KNN.关键词
KNN分类器/样本裁剪/快速分类/类不平衡Key words
KNN classifiers/example reduction/fast classification/class imbalance分类
信息技术与安全科学引用本文复制引用
王子旗,何锦雯,蒋良孝..基于冗余度的KNN训练样本裁剪新算法[J].计算机工程与应用,2019,55(22):40-45,6.基金项目
国家自然科学基金联合基金重点项目(No.U1711267). (No.U1711267)