计算机工程与应用2020,Vol.56Issue(1):46-52,7.DOI:10.3778/j.issn.1002-8331.1901-0083
基于边界混合重采样的非平衡数据分类方法
Imbalanced Data Classification Method Based on Boundary Mixed Resampling
摘要
Abstract
In the problem of imbalanced data classification, aiming to synthesize valuable new samples and delete the original samples without any influence, a novel imbalanced data classification method based on boundary mixed resampling is proposed. Firstly, the concept of k-outlier is introduced to find out the boundary and non-boundary samples and then deal with them in different ways. The minority samples in boundary are taken as the target points to synthesize new sample points while the non-boundary majority ones are under sampled based on distance to achieve a basic balance of samples. By comparing the experimental results, it shows that the proposed algorithm achieves a better classification perfor-mance on the classification accuracy of minority samples to some extent on the premise of ensuring a better G-mean value.关键词
支持k-离群度/重采样/边界点/非平衡数据分类Key words
k-outlier/resampling/boundary points/imbalanced data classification分类
信息技术与安全科学引用本文复制引用
侯贝贝,刘三阳,普事业..基于边界混合重采样的非平衡数据分类方法[J].计算机工程与应用,2020,56(1):46-52,7.基金项目
国家自然科学基金(No.61877046) (No.61877046)
陕西省自然科学基金(No.2017JM1001). (No.2017JM1001)