| 注册
首页|期刊导航|智能系统学报|一种基于KNN和随机仿射的边界样本合成过采样方法

一种基于KNN和随机仿射的边界样本合成过采样方法

冷强奎 孙薛梓 孟祥福

智能系统学报2025,Vol.20Issue(2):329-343,15.
智能系统学报2025,Vol.20Issue(2):329-343,15.DOI:10.11992/tis.202311038

一种基于KNN和随机仿射的边界样本合成过采样方法

A borderline sample synthesis oversampling method based on KNN and random affine transformation

冷强奎 1孙薛梓 1孟祥福1

作者信息

  • 1. 辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
  • 折叠

摘要

Abstract

Oversampling is a proven strategy for addressing imbalanced data classification challenges.This paper intro-duces a borderline sample synthesis oversampling method based on K-nearest neighbor(KNN)and random affine trans-formation to improve both the seed sample selection stage and synthetic sample generation stages of existing over-sampling methods.Initially,the three nearest neighbor theory is applied to establish an effective intrinsic neighborhood relationship between samples and remove noise from the dataset.This step helps reduce the risk of overfitting by sub-sequent classifiers.Next,the minority-class borderline samples that are difficult to learn but contain rich information are accurately identified and treated as sampling seeds.Finally,the method replaces traditional linear interpolation with loc-al random affine transformation,uniformly generating synthetic samples within the approximate manifold of the origin-al data.Compared with traditional oversampling methods,the proposed method more effectively leverages important borderline information within datasets,thereby enhancing classifier performance.Extensive comparative experiments were conducted on 18 benchmark datasets,comparing the proposed method against 8 classic sampling methods,each combined with 4 different classifiers.The results show that this method achieves higher F1 scores and geometric means(G-mean),addressing the imbalanced data classification problem more effectively.Furthermore,statistical analysis con-firms that the method has a higher Friedman ranking.

关键词

K近邻/线性插值/边界样本/自然分布/过采样/三近邻理论/随机仿射变换/不平衡分类

Key words

K-nearest neighbor/linear interpolation/borderline sample/natural distribution/oversampling/three nearest neighbor theory/random affine transformation/imbalanced classification

分类

信息技术与安全科学

引用本文复制引用

冷强奎,孙薛梓,孟祥福..一种基于KNN和随机仿射的边界样本合成过采样方法[J].智能系统学报,2025,20(2):329-343,15.

基金项目

国家自然科学基金青年项目(61602056) (61602056)

国家自然科学基金面上项目(61772249) (61772249)

辽宁省教育厅项目(JYTMS20230819) (JYTMS20230819)

辽宁工程技术大学博士科研启动基金项目(21-1043). (21-1043)

智能系统学报

OA北大核心

1673-4785

访问量5
|
下载量0
段落导航相关论文