| 注册
首页|期刊导航|郑州大学学报(工学版)|基于特征转换和少数类聚类的微生物数据扩增算法

基于特征转换和少数类聚类的微生物数据扩增算法

温柳英 郑天浩

郑州大学学报(工学版)2025,Vol.46Issue(6):23-31,9.
郑州大学学报(工学版)2025,Vol.46Issue(6):23-31,9.DOI:10.13705/j.issn.1671-6833.2025.06.006

基于特征转换和少数类聚类的微生物数据扩增算法

Microbial Data Augmentation Algorithm Based on Feature Transformation and Minority Clustering

温柳英 1郑天浩1

作者信息

  • 1. 西南石油大学 计算机与软件学院,四川 成都 610500
  • 折叠

摘要

Abstract

The high-dimensional characteristics of microbial data,the high zero-value rate,and the scarcity of mi-nority-class samples,which led to class imbalance,had significantly weakened classifiers' ability to identify minori-ty class.Existing augmentation algorithms are sensitive to high imbalance ratios(IR)and struggle to effectively synthesize samples.In this study a microbial data augmentation algorithm based on feature transformation and mi-nority class clustering(FTMC)was presenteed.Firstly,the feature transformation stage used the principal compo-nents analysis algorithm to down thescale high-dimensional data to alleviate the problem of strong data sparsity.Subsequently,in the minority class clustering stage,the K-Means algorithm was used to capture the local features of the minority classes and obtain multiple clusters.In the cluster screening stage,based on the density and diffi-culty of each cluster,combined with the IR and weight ratio,its weight value was calculated and used to screen a subset of core clusters for subsequent sample generation.Finally,in the sample augmentation and filtering stage,a linear interpolation algorithm was used augment to the samples for each core cluster,and a local anomaly factor al-gorithm was used to filter outliers to ensure the quality of the augmented samples.The experiments were conducted on 12 microbial datasets and the performance was compared with 8 sampling algorithms of the same type with 3 clas-sifiers.Results indicated that samples generated by FTMC were more diverse,with an average improvement of 26.42%in the Recall metric.This demonstrated that the algorithm could correctly identify more positive samples.

关键词

微生物数据/高维/稀疏/类别不平衡/聚类/数据扩增

Key words

microbial data/high-dimensional/sparsity/class imbalance/cluster/data augmentation

分类

信息技术与安全科学

引用本文复制引用

温柳英,郑天浩..基于特征转换和少数类聚类的微生物数据扩增算法[J].郑州大学学报(工学版),2025,46(6):23-31,9.

基金项目

中央引导地方科技发展专项项目(2021ZYD0003) (2021ZYD0003)

郑州大学学报(工学版)

OA北大核心

1671-6833

访问量1
|
下载量0
段落导航相关论文