| 注册
首页|期刊导航|计算机工程与应用|基于冗余度的KNN训练样本裁剪新算法

基于冗余度的KNN训练样本裁剪新算法

王子旗 何锦雯 蒋良孝

计算机工程与应用2019,Vol.55Issue(22):40-45,6.
计算机工程与应用2019,Vol.55Issue(22):40-45,6.DOI:10.3778/j.issn.1002-8331.1809-0275

基于冗余度的KNN训练样本裁剪新算法

New Redundancy-Based Algorithm for Reducing Amount of Training Examples in KNN

王子旗 1何锦雯 1蒋良孝1

作者信息

  • 1. 中国地质大学(武汉)计算机学院,武汉 430074
  • 折叠

摘要

Abstract

As one of the top 10 algorithms in data mining, the K-Nearest-Neighbor(KNN)algorithm is widely used because it is an non-parametric, simple and effective algorithm without training time. However, when it faces to massive amount of high-dimensional training examples, its high classification time complexity becomes a bottleneck of its application. In addition, its classification performance is often harmed, when the class distribution of training examples is skewed and the class imbalance problem occurs. To address these two issues, this paper proposes a new redundancy-based algorithm for reducing the amount of training examples(simply RBKNN). RBKNN at first computes the redundancy of each training example, and then randomly deletes some high redundant training examples by introducing a pre-processing process. RBKNN can not only reduce the size of training example set, but also make the class distribution of training examples more balanced. The experimental results show that RBKNN significantly promotes the efficiency of KNN, yet at the same time maintains or improves the classification accuracy of KNN.

关键词

KNN分类器/样本裁剪/快速分类/类不平衡

Key words

KNN classifiers/example reduction/fast classification/class imbalance

分类

信息技术与安全科学

引用本文复制引用

王子旗,何锦雯,蒋良孝..基于冗余度的KNN训练样本裁剪新算法[J].计算机工程与应用,2019,55(22):40-45,6.

基金项目

国家自然科学基金联合基金重点项目(No.U1711267). (No.U1711267)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文