| 注册
首页|期刊导航|计算机与现代化|基于联合熵的非平衡数据边界混合重采样

基于联合熵的非平衡数据边界混合重采样

周传华 任太娇 罗岚 周昊

计算机与现代化Issue(9):95-100,113,7.
计算机与现代化Issue(9):95-100,113,7.DOI:10.3969/j.issn.1006-2475.2024.09.016

基于联合熵的非平衡数据边界混合重采样

Boundary Mixed Resampling Based on Joint Entropy for Imbalanced Data

周传华 1任太娇 2罗岚 2周昊2

作者信息

  • 1. 安徽工业大学管理科学与工程学院,安徽 马鞍山 243032||中国科学技术大学计算机科学与技术学院,安徽 合肥 230026
  • 2. 安徽工业大学管理科学与工程学院,安徽 马鞍山 243032
  • 折叠

摘要

Abstract

In order to overcome the limitations of single resampling methods in data imbalance handling,which often lead to the generation of redundant samples and the inadvertent deletion of crucial sample information,this paper proposes a novel non-balanced data boundary mixed resampling algorithm based on joint entropy.The algorithm first effectively distinguishes between the boundary set and the non-boundary set by introducing a boundary factor.It further constructs a joint entropy indicator system to assess the importance of minority class samples within the boundary set.Based on this assessment,different oversampling methods and sampling quantities are applied to the segmented minority class samples.Finally,the NearMiss-2 algorithm is used to filter and remove most of the sample points in the non-boundary set,thus achieving a relative data balance.Through compara-tive experiments on nine sets of UCI datasets,the experimental results show that the proposed algorithm achieves improvements in F1-Score,G-mean,and AUC metrics,which validates its effectiveness and exhibiting favorable performance in non-balanced data classification.

关键词

不平衡数据分类/边界因子/联合熵/混合采样

Key words

imbalanced data classification/boundary factor/joint entropy/mixed sampling

分类

信息技术与安全科学

引用本文复制引用

周传华,任太娇,罗岚,周昊..基于联合熵的非平衡数据边界混合重采样[J].计算机与现代化,2024,(9):95-100,113,7.

基金项目

国家自然科学基金资助项目(71772002,61702006) (71772002,61702006)

复杂系统多学科管理与控制安徽普通高校重点实验室资助项目(CS2020-04) (CS2020-04)

计算机与现代化

OACSTPCD

1006-2475

访问量0
|
下载量0
段落导航相关论文