| 注册
首页|期刊导航|数据采集与处理|基于分层抽样的k近邻分类加速算法

基于分层抽样的k近邻分类加速算法

宋云胜 梁吉业

数据采集与处理2017,Vol.32Issue(6):1153-1162,10.
数据采集与处理2017,Vol.32Issue(6):1153-1162,10.DOI:10.16337/j.1004-9037.2017.06.010

基于分层抽样的k近邻分类加速算法

Acceleration Algorithm for k Nearest Neighbor Classification Based on Stratified Sam-pling

宋云胜 1梁吉业1

作者信息

  • 1. 山西大学计算机与信息技术学院,太原,030006
  • 折叠

摘要

Abstract

k nearest neighbor (kNN) ,which is one of the most typical data mining algorithms ,is widely applied in various areas due to its better generation ability and sufficient theory results .The method needs to compute the distances between the test instances and all the training instances during executing prediction .However ,it costs substantial time as facing the large-scale data .To solve the problem ,we propose an acceleration algorithm for k nearest neighbor classification based on stratified sampling (SS-kNN) .In the method ,SS-kNN firstly divides the instance space into several subranges with the same number of instances ,and then samples instances from each subrange ,finally judges which subrange the test instance sit and finds its nearest neighbors from this subrange .Compared with kNN and its variant based on the random sampling ,SS-kNN could not only obtain the similar classification accuracy ,but also accelerates the running time by an average of 399 and 16 times respectively .

关键词

分层抽样/数据划分/近邻/分类精度/运行时间

Key words

stratified sampling/data partition/nearest neighbor/classification accuracy/running time

分类

信息技术与安全科学

引用本文复制引用

宋云胜,梁吉业..基于分层抽样的k近邻分类加速算法[J].数据采集与处理,2017,32(6):1153-1162,10.

基金项目

国家自然科学基金(61432011,U1435212)重点资助项目 (61432011,U1435212)

山西省煤炭重点科技攻关计划(MQ2014-09)资助项目. (MQ2014-09)

数据采集与处理

OA北大核心CSCDCSTPCD

1004-9037

访问量0
|
下载量0
段落导航相关论文