海洋预报2024,Vol.41Issue(3):61-70,10.DOI:10.11737/j.issn.1003-0239.2024.03.007
基于多模型组合的类别不平衡海洋数据质量控制方法
Quality control method for class-imbalanced oceanographic data based on multi-model combination
摘要
Abstract
This paper proposes a two-layer framework for ocean data quality control based on the combination of multiple models.Various common classification algorithms are chosen as base learners to predict the primary quality labels of ocean data,and a Voting or Stacking strategy is used to identify the quality of the data.To address the issue of class imbalance,an adaptive undersampling strategy is combined with the Focal loss function to enhance the model's ability to recognize difficult samples.To verify the performance of the proposed method,we apply it to the quality control of sea surface temperature and air temperature data that are from ICOADS(International Comprehensive Ocean-Atmosphere Data Set).The results show that the F1 score(the weighted harmonic mean of precision and recall)of rare anomaly samples by the Voting or Stacking methods can reach 0.980 6 and 0.981 2 for sea surface temperature data,and 0.998 5 and 0.998 3 for air temperature data.关键词
质量控制/海洋气象数据/集成学习/类别不平衡Key words
quality control/ocean-atmosphere data/ensemble learning/class imbalance分类
海洋科学引用本文复制引用
宋巍,张贵庆,谢京容,董明媚,岳心阳,杨扬..基于多模型组合的类别不平衡海洋数据质量控制方法[J].海洋预报,2024,41(3):61-70,10.基金项目
国家重点研发计划项目(2021YFC3101601) (2021YFC3101601)
上海市科委部分地方高校能力建设项目(20050501900). (20050501900)