首页|期刊导航|海洋预报|基于多模型组合的类别不平衡海洋数据质量控制方法

基于多模型组合的类别不平衡海洋数据质量控制方法

宋巍张贵庆谢京容董明媚岳心阳杨扬

海洋预报2024，Vol.41Issue(3)：61-70,10.

海洋预报2024，Vol.41Issue(3)：61-70,10.DOI:10.11737/j.issn.1003-0239.2024.03.007

基于多模型组合的类别不平衡海洋数据质量控制方法

Quality control method for class-imbalanced oceanographic data based on multi-model combination

宋巍 ¹张贵庆 ¹谢京容 ¹董明媚 ²岳心阳 ²杨扬²

作者信息

1. 上海海洋大学信息学院,上海 201306
2. 国家海洋信息中心,天津 300171
折叠

摘要

Abstract

This paper proposes a two-layer framework for ocean data quality control based on the combination of multiple models.Various common classification algorithms are chosen as base learners to predict the primary quality labels of ocean data,and a Voting or Stacking strategy is used to identify the quality of the data.To address the issue of class imbalance,an adaptive undersampling strategy is combined with the Focal loss function to enhance the model's ability to recognize difficult samples.To verify the performance of the proposed method,we apply it to the quality control of sea surface temperature and air temperature data that are from ICOADS(International Comprehensive Ocean-Atmosphere Data Set).The results show that the F1 score(the weighted harmonic mean of precision and recall)of rare anomaly samples by the Voting or Stacking methods can reach 0.980 6 and 0.981 2 for sea surface temperature data,and 0.998 5 and 0.998 3 for air temperature data.

关键词

质量控制/海洋气象数据/集成学习/类别不平衡

Key words

quality control/ocean-atmosphere data/ensemble learning/class imbalance

分类

海洋科学

引用本文复制引用

宋巍,张贵庆,谢京容,董明媚,岳心阳,杨扬..基于多模型组合的类别不平衡海洋数据质量控制方法[J].海洋预报,2024,41(3):61-70,10.

基金项目

国家重点研发计划项目(2021YFC3101601) （2021YFC3101601）

上海市科委部分地方高校能力建设项目(20050501900). （20050501900）

海洋预报

OA北大核心CSTPCD

ISSN：1003-0239

访问量6

下载量0

段落导航