|国家科技期刊平台
首页|期刊导航|大连理工大学学报|基于改进机器学习的PM2.5浓度预测模型研究

基于改进机器学习的PM2.5浓度预测模型研究OA北大核心CSTPCD

Study of PM2.5 concentration prediction model based on improved machine learning

中文摘要英文摘要

针对现有机器学习模型预测PM2.5浓度存在模型过于复杂、没有考虑时空信息和缺失值填补不准确而导致模型性能下降的问题,利用随机森林取代统计学方法填补缺失值,并纳入时空因素提升模型精度.建立了综合遥感数据、气象及协同污染物数据,适用于沿海城市的PM2.5浓度预测模型(K-means-RF-XGBoost模型),模型预测耗时仅为BP神经网络的4%.利用2019年大连市实时监测数据对模型PM2.5浓度预测进行训练和测试,结果表明,建立的K-means-RF-XGBoost模型预测PM2.5浓度有很高的准确性,与没有考虑时空信息的同种模型相比均方根误差(erms)降低了约48%,决定系数(R2)提升了约10%;能有效地预测高PM2.5浓度并适用于波动范围大的情况,如春季模型在测试集中R2可达0.935;同时在日级预测上表现优异,R2可达0.819.该研究为沿海城市PM2.5浓度预测提供了新思路.

In response to the problem of performance decrease of existing machine learning model for predicting PM2.5 concentration because that the model is too complex,and does not consider spatio-temporal information and effective missing values imputation is not accurate,random forest is used instead of statistical methods to fill in missing values,and spatio-temporal factors are incorporated to improve model accuracy.Combining remote sensing data,meteorological and collaborative pollutant data,a model(K-means-RF-XGBoost model)suitable for PM2.5 concentration prediction in coastal cities is established,with a prediction time of only 4%of that of BP neural networks.The prediction of PM2.5 concentration of the model is trained and tested using real-time monitoring data from Dalian in 2019.The results show that the established K-means-RF-XGBoost model has high accuracy in predicting PM2.5 concentration,and compared to the same model without considering spatio-temporal information,the root mean square error(erms)decreases by about 48%,and coefficient of determination(R2)increases by about 10%.It effectively predicts high PM2.5 concentrations and is suitable for large fluctuation ranges,such as an R of 0.935 is achieved in the testing set for the spring model.At the same time,it performs well in daily prediction,with an R2 of 0.819.This study provides a new idea for predicting PM2.5 concentration in coastal cities.

丁成亮;郑洪波

大连理工大学环境学院,辽宁大连 116024

环境科学

PM2.5浓度预测时空信息缺失值填补机器学习

PM2.5 concentration predictionspatio-temporal informationmissing values imputationmachine learning

《大连理工大学学报》 2024 (004)

353-360 / 8

国家自然科学基金资助项目(42071273);中央高校基本科研业务费专项资金资助项目(DUT22LAB132).

10.7511/dllgxb202404004

评论