Hourly PM2.5 prediction and its comparative analysis under multi-machine learning model
KANG Jun-Feng1, HUANG Lie-Xing1, ZHANG Chun-Yan3, ZENG Zhao-Liang2, YAO Shen-Jun4
1. School of Architecture and Surveying Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China;
2. Chinese Antarctic Center of Surveying and Mapping, Wuhan University, Wuhan 430079, China;
3. Chongqing Wanzhou District Planning and Design Institute, Chongqing 404000, China;
4. key Laboratory of Geographic Information Science, Ministry of Education, Shanghai 200241, China
Six models were built for timely and accurate estimation of PM2.5 concentration and pollution levels, namely K Nearest Neighbor (KNN) model, BP Neural Network (BPNN) model, Support Vector Machine (SVM) regression model, Gaussian Process Regression (GPR) model, XGBoost model and Random forest(RF) model. Ganzhou City of Jiangxi Province was selected as the study area. The hourly ground-based meteorological data, PM2.5 concentration data and Merra-2reanalysis data from 2017 to 2018 were used for modelling. The results show that PM2.5 concentration can also be predicted using visibility and meteorological data when pollutant observation data are missing. In terms of the prediction accuracy of PM2.5 concentration, the XGBoost model performs best, followed by the RF model, and the GPR model is the worst. The prediction accuracy of the six models was generally highest in winter, followed by autumn and spring, and lowest in summer. Compared with other models, the XGBoost model exhibits a more accurate prediction performance for PM2.5 pollution level prediction with the comprehensive accuracy rate of 87.6%. Moreover, XGBoost model has the advantages of short training and small memory consumption. Visibility (followed by the relative humidity and time variable) play a key factor in the XGBoost models for PM2.5 concentration prediction. This study can provide a reference for environmental departments to accurately predict and forecast PM2.5 concentration.
王玮,汤大钢,刘红杰,等.中国PM2.5污染状况和污染特征的研究[J]. 环境科学研究, 2000,13(1):1-5. Wang W, Tang D G, Liu H J, et al Research on corrent pollution sataus and pollution characteristics of PM2.5 in China[J]. Research of Environmental Sciences, 2000,13(1):1-5.
[2]
Bell M, Ebisu K, Dominici F. Spatial and temporal variation in PM2.5 chemical composition in the United States[J]. Palaeontology, 2006, 58(1):133-140.
[3]
Kasiscovick M. Long-term exposure to air pollution and incidence of cardiovascular events in women[J]. Digest of the World Core Medical Journals(Cardiology), 2007,356(5):447.
[4]
王庚辰,王普才.中国PM污染现状及其对人体健康的危害[J]. 科技导报, 2014,32(26):72-78. Wang G C, Wang P C. PM2.5 pollution in China and its harmfulness to human health[J]. Science & Technology Review, 2014,32(26):72-78.
[5]
Kim Y, Manley J, Radoias V. Medium-and long-term consequences of pollution on labor supply:evidence from Indonesia[J]. IZA Journal of Labor Economics, 2017,6(1):5.
[6]
Jacob D J, Winner D A. Effect of climate change on air quality[J]. Atmospheric Environment, 2009,43(1):51-63.
[7]
Tai A P K, Mickley L J, Jacob D J. Correlations between fine particulate matter (PM2.5) and meteorological variables in the United States:Implications for the sensitivity of PM2.5 to climate change[J]. Atmospheric Environment, 2010,44(32):3976-3984.
[8]
王薇,陈明.城市绿地空气负离子和PM2.5浓度分布特征及其与微气候关系——以合肥天鹅湖为例[J]. 生态环境学报, 2016,25(9):1499-1507. Wang W, Chen M. Distribution characteristics of negative air ion and PM2.5 and their relationships with the micro climate in different urban greenlands——case study of Hefei Swan Lake[J]. Journal of Ecology and Environment, 2016,25(9):1499-1507.
[9]
Kan H, Chen R, Tong S. Ambient air pollution, climate change, and population health in China[J]. Environment International, 2012,42:0-19.
[10]
陈波,鲁绍伟,李少宁.北京城市森林不同天气状况下PM2.5浓度变化[J]. 生态学报, 2016,36(5):1391-1399. Chen B, Lu S W, Li S N. Dynamic analysis of PM2.5 concentrations in urban forests in beijing for various weather conditions[J]. Journal of Ecology, 2016,36(5):1391-1399.
[11]
孙宝磊.基于BP神经网络的大气污染物浓度预测[D]. 昆明:昆明理工大学, 2017. Sun B L. Prediction of atmospheric pollutant concentration based on BP neural network[D]. Kunming:Kunming University of Technology, 2017.
[12]
李锋,朱彬,安俊岭,等.2013年12月初长江三角洲及周边地区重霾污染的数值模拟[J]. 中国环境科学, 2015,35(7):1965-1974. Li F, Zhu B, An J L, et al. Modeling study of a severe haze episode occurred over the Yangtze River Delta and its surrounding region during early December, 2013[J]. China Environmental Science, 2015, 35(7):1965-1974.
[13]
周广强,谢英,吴剑斌,等.基于WRF-Chem模式的华东区域PM2.5预报及偏差原因[J]. 中国环境科学, 2016,36(8):2251-2259. Zhou G Q, Xie Y, Wu J B, et al. WRF-Chem based PM2.5 forecast and bias analysis over the East China Region.[J]. China Environmental Science, 2016,36(8):2251-2259.
[14]
Dennis R L, Byun D W, Novak J H. The next generation of integrated air quality modeling:EPA's models-3[J]. Atmospheric Environment, 1996,30(12):0-1938.
[15]
Grell G A, Peckham S E, Schmitz R, et al. Fully coupled "online" chemistry within the WRF model[J]. Atmospheric Environment, 2005, 39(37):6957-6975.
[16]
王自发,谢付莹,王喜全,等.嵌套网格空气质量预报模式系统的发展与应用[J]. 大气科学, 2006,31(5):778-790. Wang Z F, Xie F Y, Wang X Q, et al. Development and application of nested air quality prediction modeling system[J]. Atmospheric Science, 2006,31(5):778-790.
[17]
皮冬勤,陈焕盛,魏巍,等.京津冀一次重污染过程的成因和来源[J]. 中国环境科学, 2019,39(5):1899-1908. Pei D Q, Chen H S, Wei W, et al. The causes and sources of a heavy-polluted event in Beijing-Tianjin-Hebei region[J]. China Environmental Science, 2019,39(5):1899-1908.
[18]
Pai T Y, Ho C L, Chen S W, et al. Using seven types of GM (1, 1) model to forecast hourly particulate matter concentration in Banciao City of Taiwan[J]. Water, Air & Soil Pollution, 2011,217(1-4):25-33.
[19]
毛毳,孙宇,冯樷,等.空气中PM2.5浓度的灰色预测与关联因素分析[J]. 宁夏大学学报(自然科学版), 2014,35(3):283-288. Mao C, Sun Y, Feng C. et al. Grey forecast and correlation factors analysis of PM2.5 in the air[J]. Journal of Ningxia University (Natural Science Edition), 2014,35(3):283-288.
[20]
付倩娆.基于多元线性回归的雾霾预测方法研究[J]. 计算机科学, 2016,43(S1):526-528. Fu Q R. Research on haze prediction based on multivariate linear regression[J]. Computer Science, 2016,43(S1):526-528.
[21]
王敏,邹滨,郭宇,等.基于BP人工神经网络的城市PM2.5浓度空间预测[J]. 环境污染与防治, 2013,35(9):63-66+70. Wang M, Zou B, Guo Y, et al. BP artificial neural network-based analysis of spatial variability of urban PM2.5 concentration[J]. Environmental Pollution and Prevention, 2013,35(9):63-66,70.
[22]
郑毅,朱成璋.基于深度信念网络的PM2.5预测[J]. 山东大学学报(工学版), 2014,44(6):19-25. Zheng Y, Z Cheng Z. A prediction method of atmospheric PM2.5 based on DBNs[J]. Journal of Shandong University (Engineering Edition), 2014,44(6):19-25.
[23]
谢永华,张鸣敏,杨乐,等.基于支持向量机回归的城市PM2.5浓度预测[J]. 计算机工程与设计, 2015,36(11):3106-3111. Xie Y H, Zhang M M, Yang L, et al. Predicting urban PM2.5 concentration in China using support vector regression[J]. Computer Engineering and Design, 2015,36(11):3106-3111.
[24]
Sun W, Sun J. Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm[J]. Journal of Environmental Management, 2016,188:144-152.
[25]
宋国君,国潇丹,杨啸,等.沈阳市PM2.5浓度ARIMA-SVM组合预测研究[J]. 中国环境科学, 2018,38(11):4031-4039. Song G J, Guo X D, Yang X, et al. ARIMA-SVM combination prediction of PM2.5 concentration in Shenyang[J]. China Environmental Science, 2018,38(11):4031-4039.
[26]
李建新,刘小生,刘静,等.基于MRMR-HK-SVM模型的PM2.5浓度预测[J]. 中国环境科学, 2019,39(6):2304-2310. Li J X, Liu X S, Liu J, et al. Prediction of PM2.5 concentration based on MRMR-HK-SVM model[J]. China Environmental Science, 2019, 39(6):2304-2310.
[27]
Zeng Z L, Wang Z M, Gui K, et al. Daily global solar radiation in China estimated from high-density meteorological observations:A random forest model framework[J]. Earth and Space Science, doi:10.1029/2019EA001058.
[28]
侯俊雄,李琦,朱亚杰,等.基于随机森林的PM2.5实时预报系统[J]. 测绘科学, 2017,42(1):1-6. Hou J X, Li Q, Zhu Y J, et al. Real-time forecasting system of PM2.5 concentration based on spark framework and random forest model[J]. Surveying and Mapping Science, 2017,42(1):1-6.
[29]
Huang K, Xiao Q, Meng X, et al. Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain[J]. Environmental Pollution, 2018:675-683.
[30]
吴一帆,张子豪,王帅,等.大连市大气污染特征、影响因素及来源分析[J]. 环境工程, 2018,36(6):104-109. Wu Y F, Zhang Z H, Wang S, et al. Characterization of concentrations, influence factors, and sources of air pollutants in Dalian, China.[J]. Environmental Engineering, 2018,36(6):104-109.
[31]
Wang X, Zhang R, Yu W. The effects of PM2.5 concentrations and relative humidity on atmospheric visibility in Beijing[J]. Journal of Geophysical Research-atmospheres, 2019,124(4):2235-2259.
[32]
曾昭亮,郭建平,马大喜.基于江西地区多卫星数据的气溶胶立体分布研究[J]. 大气与环境光学学报, 2016,11(5):75-84. Zeng Z L, Guo J P, Ma D X. Research of aerosol three-dimensional distribution based on multi-satellite data over Jiangxi[J]. Journal of Atmospheric and Environmental Optics, 2016,11(5):75-84.
[33]
刘鑫鑫.江西省空气质量评价及其与经济增长的关系研究[D]. 南昌:南昌大学, 2016. Liu X X. Evaluation of air quality and its relationship with economic growth in Jiangxi Province[D]. Nanchang:Nanchang University, 2016.
[34]
Gelaro R, Mccarty W, Max J. Suárez, et al. The modern-era retrospective analysis for research and applications, Version 2(MERRA-2)[J]. Journal of Climate, 2017,30(14).
[35]
Draper C S, Reichle R H, Koster R D, et al. Assessment of MERRA-2Land Surface Energy Flux Estimates[J]. Journal of Climate, 2018, 31(2):671-691.
[36]
Sun E, Xu X, Che H, et al. Variation in MERRA-2 aerosol optical depth and absorption aerosol optical depth over China from 1980 to 2017[J]. Journal of Atmospheric and Solar-Terrestrial Physics, 2019:8-19.
[37]
Qin W, Zhang Y, Chen J, et al. Variation, sources and historical trend of black carbon in Beijing, China based on ground observation and MERRA-2reanalysis data[J]. Environmental Pollution, 2019:853-863.
[38]
Charles L, Gross B, Wu Y, et al. Atmospheric transport of smoke and dust particulates and their interaction with the planetary boundary layer as observed by multi-wavelength lidar and supporting instrumentation[R]. Proceedings of SPIE-The International Society for Optical Engineering, 2007,6681.
[39]
Patil M N, Patil S D, Waghmare R T, et al. Planetary boundary layer height over the Indian subcontinent during extreme monsoon years[J]. Journal of Atmospheric and Solar-Terrestrial Physics, 2013:92:94-99.
[40]
Chu X, Xue L, Geerts B, et al. The impact of boundary layer turbulence on snow growth and precipitation:Idealized large eddy simulations[J]. Atmospheric Research, 2018,204:
[41]
Tandon A, Yadav S, Attri A K, et al. Non-linear analysis of short term variations in ambient visibility[J]. Atmospheric Pollution Research, 2013,4(2):199-207.
[42]
Lin G, Fu J Y, Jiang D, et al. Spatial variation of the relationship between PM2.5 concentrations and meteorological parameters in China[J]. Biomed Research International, 2015,2015(21):259-65.
[43]
郭立力,赵春江.十折交叉检验的支持向量机参数优化算法[J]. 计算机工程与应用, 2009,45(8):59-61. Guo L L, Zhao C J. Optimizing parameters of support vector machine's model based on genetic algorithm[J]. Computer Engineering and Applications, 2009,45(8):59-61.
[44]
Swami A. Jain R. Scikit-learn:Machine learning in python[J]. Journal of Machine Learning Research, 2012,12(10):2825-2830.
[45]
Bergstra J, Bengio Y. Random search for hyper-parameter optimization[J]. Journal of Machine Learning Research, 2012,13(1):281-305.
[46]
Zhang M, Zhou Z. ML-KNN:A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007,40(7):2038-2048.
[47]
桑应宾.基于K近邻的分类算法研究[D]. 重庆:重庆大学, 2009. Sang Y B. Research on classification algorithm based on K nearest neighbor[D]. Chongqing:Chongqing University, 2009.
[48]
Lecun Y, Boser B E, Denker J S, et al. Handwritten digit recognition with a back-propagation network[C]. neural information processing systems, 1989:396-404.
[49]
刘天舒. BP神经网络的改进研究及应用[D]:哈尔滨:东北农业大学, 2011. Liu T S. The Research and Application on BP Neural Network Improvement[D]:Harbin:Northeast Agricultural University, 2011.
[50]
戚德虎,康继昌.BP神经网络的设计[J]. 计算机工程与设计, 1998, 19(2):47-49. Qi D H, Kang J C. The design of BP neural network[J]. Computer Engineering and Design, 1998,19(2):47-49.
[51]
Suykens J A K, Vandewalle J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999,9(3):293-300.
[52]
张学工.关于统计学习理论与支持向量机[J]. 自动化学报, 2000, 26(1):36-46. Zhang X G. Introduction to statistical learning theory and support vector machines[J]. Autochemistry, 2000,26(1):36-46.
[53]
Nguyen-Tuong D, Seeger M, Peters J. Model learning with local gaussian process regression[J]. Advanced Robotics, 2009,23(15):2015-2034.
[54]
何志昆,刘光斌,赵曦晶,等.高斯过程回归方法综述[J]. 控制与决策, 2013,28(8):1121-1129+1137. He Z K, Liu G B, Zhao X J, et al. A review of gaussian process regression methods[J]. Control and Decision-Making, 2013,28(8):1121-1129,1137.
[55]
Benesty M. Xgboost:Extreme gradient boosting[J]. Sage Publications, 2016,79(5):931-961.
[56]
叶倩怡,饶泓,姬名书.基于Xgboost的商业销售预测[J]. 南昌大学学报(理科版), 2017,41(3):275-281. Ye Q Y, Rao H, Ji M S. Sales prediction of stores based on Xgboost algorithm[J]. Journal of Nanchang University (Natural Science), 2017,41(3):275-281.
[57]
方匡南,吴见彬,朱建平,等.随机森林方法研究综述[J]. 统计与信息论坛, 2011,26(3):32-38. Fang K N, Wu J B, Zhu J P, et al. A review of technologies on random forests[J]. Statistics & Information Forum, 2011,26(3):32-38.
[58]
Diazuriarte R, De Andres S A. Gene selection and classification of microarray data using random forest[J]. BMC Bioinformatics, 2006,7(1):3-3.
[59]
环境空气质量指数(AQI)技术规定(试行)[J]. 中国环境管理干部学院学报, 2012,22(1):48. Technical regulation on ambient air quality index (on trial)[J]. Journal of Environmental Management College of China, 2012, 22(1):48.
[60]
Ma X, Jia H L, Sha T. spatial and seasonal characteristics of particulate matter and gaseous pollution in China:Implications for control policy[J]. Environmental Pollution, 2019,(248).421-128