Short-term PM2.5 concentration prediction based on XGBoost and LSTM variable weight combination model: a case study of Shanghai
KANG Jun-feng1, TAN Jian-lin1, FANG Lei2, XIAO Ya-lai1
1. School of Civil and Surveying & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China; 2. Department of Environmental Science and Engineering, Fudan University, Shanghai 200433, China
Abstract:In order to further improve the accuracy of PM2.5 concentration prediction, a variable weight combination short-term 1-hour PM2.5 concentration prediction model based on LSTM network and XGBoost model was proposed. First, analyze the predictive factors, explore the influence of air pollutant factors and meteorological factors on the PM2.5 concentration, to determine the best PM2.5 concentration predictive factors and analysis the variable importance. Then, after data pretreatment the LSTM prediction model and the XGBoost prediction model was built respectively, and adopt the adaptive variable weight combination method based on residual improvement to obtain the final prediction result. The results show that:The relative importance of pollutant variables is higher than the importance of meteorological factors, among which the relative importance of current PM2.5 concentration and CO concentration is higher, while the importance of average wind speed and relative humidity is lower. The values of RMSE, MAE and MAPE of the variable weight combined XGBoost-LSTM (Variable) model proposed in this study are 1.75, 1.12 and 6.06, which are better than LSTM, XGBoost, SVR, XGBoost-LSTM (Equal) and XGBoost-LSTM (Residual) model. The combined model predicts performance best in spring but the forecast accuracy is poor in summer. The variable weight method combination model proposed in this study effectively combines the advantages of the two models, not only considers the time series information of the data but also takes into account the nonlinear relationship between the features, and has higher prediction accuracy compared with other models.
康俊锋, 谭建林, 方雷, 肖亚来. XGBoost-LSTM变权组合模型支持下短期PM2.5浓度预测——以上海为例[J]. 中国环境科学, 2021, 41(9): 4016-4025.
KANG Jun-feng, TAN Jian-lin, FANG Lei, XIAO Ya-lai. Short-term PM2.5 concentration prediction based on XGBoost and LSTM variable weight combination model: a case study of Shanghai. CHINA ENVIRONMENTAL SCIENCECE, 2021, 41(9): 4016-4025.
Kim Y, Manley J, Radoias V. Medium-and long-term consequences of pollution on labor supply:evidence from Indonesia[J]. IZA Journal of Labor Economics, 2017,6(1):1-15.
[2]
王庚辰,王普才.中国PM2.5污染现状及其对人体健康的危害[J]. 科技导报, 2014,32(26):72-78.Wang G C, Wang P C. PM2.5 pollution in China and its harmfulness to human health[J]. Science & Technology Review, 2014,32(26):72-78.
[3]
Dennis R L, Byun D W, Novak J H. The next generation of integrated air quality modeling:EPA's models-3[J]. Atmospheric Environment, 1996,30(12):1925-1938.
[4]
周广强,谢英,吴剑斌,等.基于WRF-Chem模式的华东区域PM2.5预报及偏差原因[J]. 中国环境科学, 2016,36(8):2251-2259.Zhou G Q, Xie Y, Wu J B, et al. WRF-Chem based PM2.5 forecast and bias analysis over the East China Region[J]. China Environmental Science, 2016,36(8):2251-2259.
[5]
Qingxin W, Qiaolin Z, Jinhua T, et al. Estimating PM2.5 concentrations based on MODIS AOD and NAQPMS data over Beijing-Tianjin-Hebei.[J]. Sensors (Basel, Switzerland), 2019,19(5):1207.
[6]
Zhang Z, Wu L, Chen Y. Forecasting PM2.5 and PM10 concentrations using GMCN(1,N) model with the similar meteorological condition:Case of Shijiazhuang in China[J]. Ecological Indicators, 2020,119:106871.
[7]
Pai T, Ho C, Chen S, et al. Using seven types of GM (1, 1) model to forecast hourly particulate matter concentration in Banciao City of Taiwan[J]. Water, Air, & Soil Pollution, 2011,217(1):25-33.
[8]
方晓婷,段华波,胡明伟,等.气象因素对大气污染物影响的季节差异分析及预测模型对比——以深圳为例[J]. 环境污染与防治, 2019, 41(5):541-546.Fang X T, Duan H B, Hu W M, et al. The seasonal differential effects of meteorological parameters on atmospheric pollutants and the prediction model comparison:a case study of Shenzhen[J]. Environmental Pollution & Control, 2019,41(5):541-546.
[9]
Liao Q, Zhu M, Wu L, et al. Deep learning for air quality forecasts:a review[J]. Current Pollution Reports, 2020:1-11.
[10]
戴李杰,张长江,马雷鸣.基于机器学习的PM2.5短期浓度动态预报模型[J]. 计算机应用, 2017,37(11):3057-3063.Dai L J, Zhang C J, Ma L M, et al. Dynamic forecasting model of short-term PM2.5 concentration based on machine learning[J]. Journal of Computer Applications, 2017,37(11):3057-3063.
[11]
郑毅,朱成璋.基于深度信念网络的PM2.5预测[J]. 山东大学学报(工学版), 2014,44(6):19-25.Zheng Y, Zhu C Z. A prediction method of atmospheric PM2.5 based on DBNs[J]. Journal of Shandong University(Engineering Science), 2014,44(6):19-25.
[12]
朱晏民,徐爱兰,孙强.基于深度学习的空气质量预报方法新进展[J]. 中国环境监测, 2020,36(3):10-18.Zhu Y M, Xu A L, Sun Q. New progress for air quality forecasting methods based on deep learning[J]. Environmental Monitoring in China, 2020,36(3):10-18.
[13]
谢永华,张鸣敏,杨乐,等.基于支持向量机回归的城市PM2.5浓度预测[J]. 计算机工程与设计, 2015,36(11):3106-3111.Xie Y H, Zhang M M, Yang L, et al. Predicting urban PM2.5 concentration in China using support vector regression[J]. Computer Engineering and Design, 2015,36(11):3106-3111.
[14]
侯俊雄,李琦,朱亚杰,等.基于随机森林的PM2.5实时预报系统[J]. 测绘科学, 2017,42(1):1-6.Hou J X, Li Q, Zhu Y J, et al. Real-time forecasting system of PM2.5 concentration based on spark framework and random forest model[J]. Science of Surveying and Mapping, 2017,42(1):1-6.
[15]
任才溶,谢刚.基于随机森林和气象参数的PM2.5浓度等级预测[J]. 计算机工程与应用, 2019,55(2):213-220.Ren C R, Xie G. Prediction of PM2.5 concentration level based on random forest and meteorological parameters[J]. Computer Engineering and Applications, 2019,55(2):213-220.
[16]
夏晓圣,陈菁菁,王佳佳,等.基于随机森林模型的中国PM2.5浓度影响因素分析[J]. 环境科学, 2020,41(5):2057-2065.Xia X S, Chen J J, Wang J J, et al. PM2.5 concentration influencing factors in China based on the random forest model[J]. Environmental Science, 2020,41(5):2057-2065.
[17]
王敏,邹滨,郭宇,等.基于BP人工神经网络的城市PM2.5浓度空间预测[J]. 环境污染与防治, 2013,35(9):63-66.Wang M, Zou B, Guo Y, et al. BP artificial neural network-based analysis of spatial variability of urban PM2.5 concentration[J]. Environmental Pollution & Control, 2013,35(9):63-66.
[18]
白盛楠,申晓留.基于LSTM循环神经网络的PM2.5预测[J]. 计算机应用与软件, 2019,36(1):67-70.Bai S N, Shen X L. PM2.5 Prediction based on LSTM recurrent neural network[J]. Computer Applications and Software, 2019,36(1):67-70.
[19]
Zhang Y, Bocquet M, Mallet V, et al. Real-time air quality forecasting, part I:History, techniques, and current status[J]. Atmospheric Environment, 2012,60(1):632-655.
[20]
段大高,赵振东,梁少虎,等.基于LSTM的PM2.5浓度预测模型[J]. 计算机测量与控制, 2019,27(3):215-219.Duan D G, Zhao Z D, Liang S H, et al. Research on PM2.5 concentration prediction based on LSTM[J]. Computer Measurement & Control, 2019,27(3):215-219.
[21]
Liu D, Sun K. Short-term PM2.5 forecasting based on CEEMD-RF in five cities of China[J]. Environmental Science and Pollution Research, 2019,26(32):32790-32803.
[22]
Huang K, Xiao Q, Meng X, et al. Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain[J]. Environmental Pollution, 2018,242.
[23]
Mao X, Shen T, Feng X. Prediction of hourly ground-level PM PM2.5 concentrations 3days in advance using neural networks with satellite data in eastern China[J]. Atmospheric Pollution Research, 2017,8(6):1005-1015.
[24]
赵文芳,林润生,唐伟,等.基于深度学习的PM2.5短期预测模型[J]. 南京师大学报(自然科学版), 2019,42(3):32-41.Zhao W F, Lin R S, Tang W, et al. Forecasting model of short-term concentration based on deep learning[J]. Journal of Nanjing Normal University (Natural Science Edition), 2019,42(3):32-41.
[25]
康俊锋,黄烈星,张春艳,等.多机器学习模型下逐小时PM2.5预测及对比分析[J]. 中国环境科学, 2020,40(5):1895-1905.Kang J F, Huang L X, Zhang C Y, et al. Hourly PM2.5 prediction and its comparative analysis under multi-machine learning model[J]. China Environmental Science, 2020,40(5):1895-1905.
[26]
宋国君,国潇丹,杨啸,等.沈阳市PM2.5浓度ARIMA-SVM组合预测研究[J]. 中国环境科学, 2018,38(11):4031-4039.Song G J, Guo X D, Yang X, et al. ARIMA-SVM combination prediction of PM2.5 concentration in Shenyang[J]. China Environmental Science, 2018,38(11):4031-4039.
[27]
李建更,罗奥荣,李晓理.基于互补集合经验模态分解与支持向量回归的PM2.5质量浓度预测[J]. 北京工业大学学报, 2018,44(12):1494-1502.Li J G, Luo A R, Li X l. Prediction of PM2.5 mass concentration based on complementary ensemble empirical mode decomposition and support vector Regression[J]. Journal of Beijing University of Technology, 2018,44(12):1494-1502.
[28]
Liu H, Dong S. A novel hybrid ensemble model for hourly PM2.5 forecasting using multiple neural networks:a case study in China[J]. Air Quality, Atmosphere & amp; Health, 2020:1-10.
[29]
王学梅,王凤文,陈滔,等.基于组合模型的PM2.5浓度预测及其不确定性分析[J]. 环境工程, 2020,38(8):229-235.Wang X M, Wang F W, Chen T, et al. PM2.5 concentration prediction and uncertainly analysis based on a composite model[J]. Environmental Engineering, 2020,38(8):229-235.
[30]
Wang J, Shao W, Kim J. Multifractal detrended cross-correlation analysis between respiratory diseases and haze in South Korea[J]. Chaos, Solitons and Fractals:the Interdisciplinary Journal of Nonlinear Science, and Nonequilibrium and Complex Phenomena, 2020,135:10.1016/j.Chaos.2020.109781.
[31]
Chen J, Lu J, Avise J C, et al. Seasonal modeling of PM2.5 in California's San Joaquin Valley[J]. Atmospheric Environment, 2014,92:182-190.
[32]
王新民,崔巍.变权组合预测模型在地下水水位预测中的应用[J]. 吉林大学学报(地球科学版), 2009,39(6):1101-1105.Wang X M, Cui W. Application of changeable weight combination forecasting model To groundwater level prediction[J]. Journal of Jilin University (Earth Science Edition), 2009,39(6):1101-1105.
[33]
Dietterich T G. An experimental comparison of three methods for constructing ensembles of decision trees:Bagging, boosting, and randomization[J]. Machine Learning, 2000,40(2):139-157.
[34]
Wu Y, Qi S, Hu F, et al. Recognizing activities of the elderly using wearable sensors:a comparison of ensemble algorithms based on boosting[J]. Sensor Review, 2019,39(6):743-751.
[35]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-80.
[36]
郭立力,赵春江.十折交叉检验的支持向量机参数优化算法[J]. 计算机工程与应用, 2009,45(8):55-57.Guo L L, Zhao C J. Optimizing parameters of support vector machine's model based on genetic algorithm[J]. Computer Engineering and Applications, 2009,45(8):55-57.
[37]
Zhai W, Cheng C. A long short-term memory approach to predicting air quality based on social media data[J]. Atmospheric Environment, 2020,237.
[38]
Chang Y, Chiao H, Abimannan S, et al. An LSTM-based aggregated model for air pollution forecasting[J]. Atmospheric Pollution Research, 2020,11(8):1451-1463.
[39]
Gang L, Jingying F, Dong J, et al. Spatial variation of the relationship between PM2.5 concentrations and meteorological parameters in China[J]. BioMed Research International, 2015,2015,684618.
[40]
刘明,王红蕾,索良泽.基于变权组合模型的中长期负荷概率密度预测[J]. 电力系统及其自动化学报, 2019,31(7):88-94.Liu M, Wang H L, Suo L Z. Medium-and long-term load probability density forecasting based on variable weight combination model[J]. Proceedings of the CSU-EPSA, 2019,31(7):88-94.
[41]
王新民,崔巍.变权组合预测模型在地下水水位预测中的应用[J]. 吉林大学学报(地球科学版), 2009,39(6):1101-1105.Wang X M, Cui W. Application of changeable weight combination forecasting model to groundwater level prediction[J]. Journal of Jilin University (Earth Science Edition), 2009,39(6):1101-1105.
[42]
曲悦,钱旭,宋洪庆,等.基于机器学习的北京市PM2.5浓度预测模型及模拟分析[J]. 工程科学学报, 2019,41(3):401-407.Qu Y, Qian X, Song H Q, et al. Machine-learning-based model and simulation analysis of PM2.5 concentration prediction in Beijing[J]. Chinese Journal of Engineering, 2019,41(3):401-407.
[43]
谢超,马民涛,于肖肖.多种神经网络在华北西部区域城市空气质量预测中的应用[J]. 环境工程学报, 2015,9(12):6005-6009.Xie C, Ma M T, Yu X X. Forecasting model of air pollution index based on multi-artificial neural network in western region of Northern China[J]. Chinese Journal of Environmental Engineering, 2015,9(12):6005-6009.
[44]
刘小真,任羽峰,刘忠马,等.南昌市大气颗粒物污染特征及PM2.5来源解析[J]. 环境科学研究, 2019,32(9):1546-1555.Liu X Z, Ren Y F, Liu Z M, et al. Pollution characteristics of atmospheric and source apportionment of PM2.5 in Nanchang City[J]. Research of Environmental Sciences, 2019,32(9):1546-1555.
[45]
张淑平,韩立建,周伟奇,等.冬季PM2.5的气象影响因素解析[J]. 生态学报, 2016,36(24):7897-7907.Zhan S P, Han L J, Zhou W Q, et al. Relationships between fine particulate matter(PM2.5) and meteorological factors in winter at typical Chinese cities[J]. Acta Ecological Sinical, 2016,36(24):7897-7907.
[46]
朱媛媛,高愈霄,刘冰,等.京津冀秋冬季PM2.5污染概况和预报结果评估[J]. 环境科学, 2019,40(12):5191-5201.Zhu Y Y, Gao Y X, Liu B, et al. Concentration characteristics and assessment of model-predicted results of PM2.5 in the Beijing-Tianjin-Hebei Region in autumn and winter[J]. Environmental Science, 2019,40(12):5191-5201.
[47]
翁克瑞,刘淼,刘钱. TPE-XGBOOST与LassoLars组合下PM2.5浓度分解集成预测模型研究[J]. 系统工程理论与实践, 2020, 40(3):748-760.Weng K R, Liu M, Liu Q. An integrated prediction model of PM2.5 concentration based on TPE-XGBOOST and LassoLars[J]. Systems Engineering-Theory & Practice, 2020,40(3):748-760.