郑州大学学报(工学版)2025,Vol.46Issue(6):23-31,9.DOI:10.13705/j.issn.1671-6833.2025.06.006
基于特征转换和少数类聚类的微生物数据扩增算法
Microbial Data Augmentation Algorithm Based on Feature Transformation and Minority Clustering
摘要
Abstract
The high-dimensional characteristics of microbial data,the high zero-value rate,and the scarcity of mi-nority-class samples,which led to class imbalance,had significantly weakened classifiers' ability to identify minori-ty class.Existing augmentation algorithms are sensitive to high imbalance ratios(IR)and struggle to effectively synthesize samples.In this study a microbial data augmentation algorithm based on feature transformation and mi-nority class clustering(FTMC)was presenteed.Firstly,the feature transformation stage used the principal compo-nents analysis algorithm to down thescale high-dimensional data to alleviate the problem of strong data sparsity.Subsequently,in the minority class clustering stage,the K-Means algorithm was used to capture the local features of the minority classes and obtain multiple clusters.In the cluster screening stage,based on the density and diffi-culty of each cluster,combined with the IR and weight ratio,its weight value was calculated and used to screen a subset of core clusters for subsequent sample generation.Finally,in the sample augmentation and filtering stage,a linear interpolation algorithm was used augment to the samples for each core cluster,and a local anomaly factor al-gorithm was used to filter outliers to ensure the quality of the augmented samples.The experiments were conducted on 12 microbial datasets and the performance was compared with 8 sampling algorithms of the same type with 3 clas-sifiers.Results indicated that samples generated by FTMC were more diverse,with an average improvement of 26.42%in the Recall metric.This demonstrated that the algorithm could correctly identify more positive samples.关键词
微生物数据/高维/稀疏/类别不平衡/聚类/数据扩增Key words
microbial data/high-dimensional/sparsity/class imbalance/cluster/data augmentation分类
信息技术与安全科学引用本文复制引用
温柳英,郑天浩..基于特征转换和少数类聚类的微生物数据扩增算法[J].郑州大学学报(工学版),2025,46(6):23-31,9.基金项目
中央引导地方科技发展专项项目(2021ZYD0003) (2021ZYD0003)