智能系统学报2025,Vol.20Issue(2):329-343,15.DOI:10.11992/tis.202311038
一种基于KNN和随机仿射的边界样本合成过采样方法
A borderline sample synthesis oversampling method based on KNN and random affine transformation
摘要
Abstract
Oversampling is a proven strategy for addressing imbalanced data classification challenges.This paper intro-duces a borderline sample synthesis oversampling method based on K-nearest neighbor(KNN)and random affine trans-formation to improve both the seed sample selection stage and synthetic sample generation stages of existing over-sampling methods.Initially,the three nearest neighbor theory is applied to establish an effective intrinsic neighborhood relationship between samples and remove noise from the dataset.This step helps reduce the risk of overfitting by sub-sequent classifiers.Next,the minority-class borderline samples that are difficult to learn but contain rich information are accurately identified and treated as sampling seeds.Finally,the method replaces traditional linear interpolation with loc-al random affine transformation,uniformly generating synthetic samples within the approximate manifold of the origin-al data.Compared with traditional oversampling methods,the proposed method more effectively leverages important borderline information within datasets,thereby enhancing classifier performance.Extensive comparative experiments were conducted on 18 benchmark datasets,comparing the proposed method against 8 classic sampling methods,each combined with 4 different classifiers.The results show that this method achieves higher F1 scores and geometric means(G-mean),addressing the imbalanced data classification problem more effectively.Furthermore,statistical analysis con-firms that the method has a higher Friedman ranking.关键词
K近邻/线性插值/边界样本/自然分布/过采样/三近邻理论/随机仿射变换/不平衡分类Key words
K-nearest neighbor/linear interpolation/borderline sample/natural distribution/oversampling/three nearest neighbor theory/random affine transformation/imbalanced classification分类
信息技术与安全科学引用本文复制引用
冷强奎,孙薛梓,孟祥福..一种基于KNN和随机仿射的边界样本合成过采样方法[J].智能系统学报,2025,20(2):329-343,15.基金项目
国家自然科学基金青年项目(61602056) (61602056)
国家自然科学基金面上项目(61772249) (61772249)
辽宁省教育厅项目(JYTMS20230819) (JYTMS20230819)
辽宁工程技术大学博士科研启动基金项目(21-1043). (21-1043)