环境与职业医学2026,Vol.43Issue(1):8-15,27,9.DOI:10.11836/JEOM25283
基于集成机器学习的小时级臭氧浓度估计及其健康影响研究——以太原市为例
Hourly ozone concentration estimation and its health impact study based on ensemble machine learning:A case study of Taiyuan City
摘要
Abstract
[Background]Ozone(O3)is a major air pollutant.The existing monitoring system has uneven dis-tribution of sites,insufficient coverage in underdeveloped areas,and low temporal resolution,making it difficult to obtain hourly data.This limits the dynamic identification of pollution and the formulation of prevention and control strategies. [Objective]To construct an hourly O3 concentration estimation model based on ensemble ma-chine learning,aiming to improve the accuracy of pollution exposure assessment and explore O3 health impacts. [Methods]This study integrated land use regression modeling with modern machine learning techniques,employing random forest and XGBoost algorithms to construct base models,and stacking integration using non-negative least squares.The ensemble model was trained and vali-dated across China using high-resolution,multi-source geographic data(e.g.,meteorological data,population density,land cover types,and aerosol optical thickness).It was tested in Taiyuan City,combined with a distributed lag non-linear model to analyze the association between O3 and emergency admissions. [Results]The constructed ensemble model performed well in predicting O3 concentration,with a higher coefficient of determination(R2)and a lower root-mean-square deviation(RMSE)compared to the single models.The R2 improved from 0.90 to 0.92,and the RMSE de-creased from 11.41 to 10.62,enhancing both prediction accuracy and generalization ability.In the application to Taiyuan City,the model successfully imputed the hourly-level data for the entire year.The distributed lag non-linear model analysis revealed that the relative risk(RR)values for the 6th to 8th days following O3 exposure were 1.14(95%CI:1.01,1.29),1.16(95%CI:1.02,1.31),and 1.14(95%CI:1.01,1.29),respectively,which were significantly higher than 1,indicating a significant lagged association(lagged 6-8 d)between O3 and the number of emergency room visits. [Conclusion]A high-precision,hourly-level O3 concentration estimation model is successfully constructed by combining the land use re-gression model with an ensemble machine learning approach to provide a scientific basis for environmental policy formulation and public health intervention.The application of the model verifies its generalization ability and practical application value,which can provide a new technical framework for subsequent environmental health research.关键词
臭氧/浓度估计/集成机器学习/土地利用回归/急诊人数/滞后效应Key words
ozone/concentration estimation/ensemble machine learning/land use regression/number of emergency room visit/lag effect分类
医药卫生引用本文复制引用
杜汝乐,杨晓娟,牛瑞霞,许洋,祝贵明,高倩,王彤..基于集成机器学习的小时级臭氧浓度估计及其健康影响研究——以太原市为例[J].环境与职业医学,2026,43(1):8-15,27,9.基金项目
国家自然科学基金项目(82073674,82204163,82373692) (82073674,82204163,82373692)
山西省基础研究计划资助项目(202203021212382) (202203021212382)
山西省神经疾病防治研究委级重点实验室开放课题项目(TMSYSKF2023004) (TMSYSKF2023004)