Water quality warning method based on canonical correlation coefficient and random forest
LI Ruo-nan1, WANG Qi2, LIU Shu-ming3
1. Civil, Commercial and Ecnomic Law School, China University of Political Science and Law, Beijing 100088, China; 2. School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China; 3. School of Environment, Tsinghua University, Beijing 100083, China
Abstract:This study proposed a high-precision early-warning method for detecting sudden water pollution incidents. Firstly, a database of sudden water pollution incidents containing 22common pollutants was established through simulation experiments. Secondly, the canonical correlation coefficients were used to accurately reveal the synergetic feedback law among various water quality parameters after pollution incidents. Finally, a water quality early-warning model, called "canonical correlation coefficients-random forest", was developed based on the multi-parameter synergetic feedback law identified above. Results show that the early-warning model's average true positive rates for known and unknown pollutants are 96.78% and 98.33%, respectively, while the average false positive rate under baseline status of water quality monitoring is 0.16%. The proposed early-warning model can provide practical technical support for reducing the loss of sudden water pollution incidents and ensuring the drinking water supply's safety.
李若楠, 王琦, 刘书明. 基于典型相关系数和随机森林的水质预警方法[J]. 中国环境科学, 2021, 41(9): 4457-4464.
LI Ruo-nan, WANG Qi, LIU Shu-ming. Water quality warning method based on canonical correlation coefficient and random forest. CHINA ENVIRONMENTAL SCIENCECE, 2021, 41(9): 4457-4464.
Cui B, Meng Q H. Smart water monitoring and management system based on the architecture of internet of things[J]. Applied Mechanics & Materials, 2013,278-280(5):1822-1825.
[2]
吴静,崔硕,谢超波,等.好氧处理后城市污水荧光指纹的变化[J]. 光谱学与光谱分析, 2011,31(12):3302-3306.Wu J, Cui S, Xie C B, et al. Fluorescence fingerprint transformation of municipal wastewater caused by aerobic treatment[J]. Spectroscopy and Spectral Analysis, 2011,31(12):3302-3306.
[3]
袁永钦,匡科,沈军.广州市西江引水工程水质预警系统研究与实践[J]. 中国给水排水, 2011,27(6):1-5.Yuan Y Q, Kuang K, Shen J. Research and application of early-warning system for source water quality Xijiang River water diversion project[J]. China Water and Wastewater, 2011,27(6):1-5.
[4]
Roy M, Larocque D. Robustness of random forests for regression[J]. Journal of Nonparametric Statistics, 2012,24(4):993-1006.
[5]
Bonissone P, Garrido M C. A fuzzy random forest[J]. International Journal of Approximate Reasoning, 2010,51(7):729-747.
[6]
姜旭,舒强,纪峰.城市供水管网水质在线监测预警系统构建及应用研究[J]. 给水排水, 2017,S1:282-284.Jiang X, Shu Q, Ji F. Urban water supply network on-line early warning automatic monitoring system for early warning and applications. Water and Wastewater Engineering, 2017,S1:282-284.
[7]
张锡辉,郑振华,欧阳二明.水源水质在线监测预警系统的建设[J]. 中国给水排水, 2005,21(11):14-17.Zhang X H, Zheng Z H, Ouyang E M. Construction of on-line monitoring and warning system for raw water quality[J]. China Water and Wastewater, 2005,21(11):14-17.
[8]
GB3838-2002地表水环境质量标准[S].GB3838-2002 Environmental quality standards for surface water[S].
[9]
GB5749-2006生活饮用水卫生标准[S].GB5749-2006 Standards for Drinking Water Quality[S].
[10]
GB14848-2017地下水质量标准[S].GB14848-2017 Stand for ground water quality[S].
[11]
Liu S, Che H, Smith K, et al. Contamination event detection using multiple types of conventional water quality sensors in source water[J]. Environmental Science Processes & Impacts, 2014,16(8):2028-2038.
[12]
Li R, Liu S, Smith K, et al. A canonical correlation analysis based method for contamination event detection in water source[J]. Environmental Science:Processes & Impacts, 2016,18:658-666.
[13]
Perelman L, Arad J, Housh M, et al. Event detection in water distribution systems from multivariate water quality time series[J]. Environmental Science & Technology, 2012,46(15):8212-8219.
[14]
Oliker N, Ostfeld A. A coupled classification-evolutionary optimization model for contamination event detection in water distribution systems[J]. Water Research, 2014,51(3):234-245.
[15]
De Winter J C, Gosling S D, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes:a tutorial using simulations and empirical data[J]. Psychological Methods, 2016,21(3):273-290.
[16]
Puth M T, Neuhäuser M, Ruxton G D. Effective use of Spearman's and Kendall's correlation coefficients forassociation between two measured traits. Animal Behaviour, 2015,102(1):77-84.
[17]
Kinney J B, Atwal G S. Equitability, mutual information, and the maximal information coefficient[J]. Proceedings of the National Academy of Sciences of the United States of America, 2014,111(9):3354-3368.
[18]
Barnhart H X, Haber M, Song J. Overall concordance correlation coefficient for evaluating agreement among multiple observers[J]. Biometrics, 2002,58(4):1020-1027.
[19]
Kelley K. Sample size planning for the squared multiple correlation coefficient:accuracy in parameter estimation via narrow confidence intervals[J]. British Journal of Mathematical & Statistical Psychology, 2008,43(4):524-555.
[20]
Ma S, Huang J. Regularized ROC method for disease classification and biomarker selection with microarray data[J]. Bioinformatics, 2005,21(24):4356-4362.
[21]
Marcell S, András L, Ádám N, et al. Cross-validation of survival associated biomarkers in gastric cancer using transcriptomic data of 1,065patients[J]. Oncotarget, 2016,7(31):49322-49333.