Abstract:Large scale online air quality monitoring data is the basis for air quality research, but there were lots of missing data in large scale online data. In this study, we compared several methods that dealing with the missing values and its impact on the city’s ranking of air quality base on the hourly monitoring data of 1654monitoring sites in China from 1Jan, 2016 to 31July, 2021 of 6types of air pollutants. The simulation results showed that Low Rank SVD via Alternating Least Square had smaller mean squared error, mean absolute percentage error and higher correlation coefficient compared with other traditional methods. The empirical results showed there would be 10% difference before imputation and after imputation for the missing value. The ranking would not change due to the imputation when the air quality assessed value vary greatly, and would change a lot when the assessed value was very close. The study suggested to impute missing value by using the method in this study when analysis the large-scale online air quality monitoring data.
张波, 宋国君. 大规模空气质量监测数据缺失处理方法实证研究[J]. 中国环境科学, 2022, 42(5): 2078-2087.
ZHANG Bo, SONG Guo-jun. Research on the missing value methods for large-scale online air quality monitoring data. CHINA ENVIRONMENTAL SCIENCECE, 2022, 42(5): 2078-2087.
中华人民共和国生态环境部.城市环境空气质量排名技术规定 [R]. 2018. Ministry of Ecology and Environmental of People's Republic of China. Technical regulations for air quality ranking of cities[R]. 2018.
[2]
Deng Q, Yang K, Luo Y. Spatiotemporal patterns of PM2.5 in the Beijing–Tianjin–Hebei region during 2013~2016 [J]. Geology, Ecology, and Landscapes, 2017,1(2):95-103.
[3]
Li L, Wu A H, Cheng I, et al. Spatiotemporal estimation of historical PM2.5 concentrations using PM10, meteorological variables, and spatial effect [J]. Atmospheric Environment, 2017,166:182-191.
[4]
Hu M, Wang Y, Wang S, et al. Spatial-temporal heterogeneity of air pollution and its relationship with meteorological factors in the Pearl River Delta, China [J]. Atmospheric Environment, 2021,254:118415.
[5]
Li L, Zhang J, Meng X, et al. Estimation of PM2.5 concentrations at a high spatiotemporal resolution using constrained mixed-effect bagging models with MAIAC aerosol optical depth [J]. Remote Sensing of Environment, 2018,217:573-586.
[6]
Shen Y, Zhang l, Fang X, et al. Spatiotemporal patterns of recent PM2.5 concentrations over typical urban agglomerations in China [J]. Science of the Total Environment, 2019,655:13-26.
[7]
Zhao S, Yin D, Yu Y, et al. PM2.5 and O3 pollution during 2015~2019 over 367 Chinese cities: Spatiotemporal variations, meteorological and topographical impacts [J]. Environmental Pollution, 2020,264:114694.
[8]
Li K, Jacob D J, Liao H, et al. A two-pollutant strategy for improving ozone and particulate air quality in China [J]. Nature Geoscience, 2019, 12(11):906-910.
[9]
Liu J, Li W, Wu J. A framework for delineating the regional boundaries of PM2.5 pollution: A case study of China [J]. Environmental Pollution, 2018,235:642-651.
[10]
张 烃,董树屏,滕 曼,等.区域大型环境空气综合观测中外场观测与实验室分析数据质量控制研究 [J]. 环境科学研究, 2019,32(10): 1664-1671. Zhang T, Dong S P, Teng M, et al. Quality assurance of field observation and laboratory analysis in regional large scale ambient air joint monitoring campaigns [J]. Research of Environmental Sciences, 2019,32(10):1664-1671.
[11]
师耀龙,吕怡兵,肖建军.夏季重大活动期间O3监测数据质量提升方法研究 [J]. 中国环境监测, 2020,36(2):10-14. Shi Y L, Lyu Y B, Xiao J J. Data quality control method of ozone monitoring during the guarantee for major events in summer [J]. Environmental Monitoring in China, 2020,36(2):10-14.
[12]
师耀龙,陈传忠,魏俊山,等.加强生态环境监测机构监督管理的思考与分析 [J]. 环境保护, 2018,46(23):56-60. Shi Yao-long, Chen Chuan-zhong, Wei Jun-shan, et al. The current situation and problem analysis of environmental monitoring organizations' supervision and administration [J]. Environmental Protection, 2018,46(23):56-60.
[13]
刘 媛,彭 溶,张 驰,等.环境监测从业人员监管制度研究 [J]. 环境保护, 2018,46(18):33-35. Liu Y, Peng R, Zhang C, et al. Research on supervision system of environmental monitoring practitioners [J]. Environmental Protection, 2018,46(18):33-35.
[14]
师耀龙,杨 婧,柴文轩,等.美国环境空气监测数据质量核查工作的经验与启示 [J]. 中国环境监测, 2017,33(3):8-14. Shi Y L, Yang J, Chai W X, et al. Experience and illumination of data quality assessment system for ambient air monitoring in the United States [J]. Environmental Monitoring in China, 2017,33(3):8-14.
[15]
Rumaling M I, Chee F pien, Dayou J, et al. Missing value imputation for PM10 concentration in Sabah using nearest neighbour method (NNM) and expectation-maximization (EM) algorithm [J]. Asian Journal of Atmospheric Environment, 2020,14:62-72.
[16]
Junger W L, Ponce D E Leon A. Imputation of missing data in time series for air pollutants [J]. Atmospheric Environment, 2015,102:96- 104.
[17]
Larsen L C, Shah M. A context-intensive approach to imputation of missing values in data sets from networks of environmental monitors [J]. Journal of the Air & Waste Management Association (1995), 2016,66(1):38-52.
[18]
Junninen H, Niska H, Tuppurainen K, et al. Methods for imputation of missing values in air quality data sets [J]. Atmospheric Environment, 2004,38(18):2895-2907.
[19]
Hadeed S J, O’rourke M K, Burgess J L, et al. Imputation methods for addressing missing data in short-term monitoring of air pollutants [J]. Science of The Total Environment, 2020,730:139140.
[20]
Real C, Ángel Fernández J, Aboal J R, et al. Substituting missing data in compositional analysis [J]. Environmental Pollution, 2011,159(10): 2797-2800.
[21]
Gómez-Carracedo M P, Andrade J M, López-Mahía P, et al. A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets [J]. Chemometrics and Intelligent Laboratory Systems, 2014,134:23-33.
[22]
Chen X, Xiao Y. A novel method for air quality data imputation by nuclear norm minimization [J]. Journal of Sensors, 2018,2018: e7465026.
[23]
Moshenberg S, Lerner U, Fishbain B. Spectral methods for imputation of missing air quality data [J]. Environmental Systems Research, 2015,4(1):26.
[24]
Liu X, Wang X, Zou L, et al. Spatial imputation for air pollutants data sets via low rank matrix completion algorithm [J]. Environment International, 2020,139:105713.
[25]
Hastie T, Mazumder R, Lee J D, et al. Matrix completion and low-rank SVD via fast alternating least squares [J]. 36.
[26]
Candès E J, Recht B. Exact matrix completion via convex optimization [J]. Foundations of Computational Mathematics, 2009, 9(6):717.
[27]
Candès E J, Tao T. The power of convex relaxation: near-optimal matrix completion [J]. IEEE Transactions on Information Theory, 2010,56(5):2053-2080.
[28]
Buuren S van. Flexible imputation of missing data [M]. 2nd edition. Boca Raton: Chapman and Hall/CRC, 2018.
[29]
Liu Y, Dillon T, Yu W, et al. Missing value imputation for industrial IoT sensor data with large gaps [J]. IEEE Internet of Things Journal, 2020,7(8):6855-6867.
[30]
Velasco-Gallego C, Lazakis I. A novel framework for imputing large gaps of missing values from time series sensor data of marine machinery systems [J]. Ships and Offshore Structures, 2021,10.1080/ 17445302.2021.1943850.
[31]
中华人民共和国生态环境部.2021年8月全国城市空气质量报告 [R]. 2021. Ministry of Ecology and Environmental of People's Republic of China. Air quality reports for cities in China[R]. 2021,8