A review on the progresses in random forests theory and its applications in hydrogeology
DU Shang-hai1,2,3, GU Cheng-ke1, ZHANG Wen-jing2,3
1. College of Construction Engineering, Jilin University, Changchun 130021, China; 2. Key Laboratory of Groundwater Resources and Environment, Jilin University, Changchun 130021, China; 3. College of New Energy and Environment, Jilin University, Changchun 130021, China
Abstract:Random Forest Theory is a rapidly developing artificial intelligence integrated learning algorithm and increasingly used in the fields of hydrogeology due to its higher tolerance of outliers in data series and significantly higher accurate prediction than other commonly used algorithms do. Based on the introduction of the theory and applications of the random forest algorithm, this paper reviewed its applications in the hydrogeological fields such as groundwater potential assessment, surface water-groundwater conversion, groundwater quality assessment and groundwater contamination prediction. The results show that the random forest theory can effectively solve the problems related to parameter and process uncertainty in hydrogeological researches, and has broad application prospects in the accurate portrayal of hydrogeological structure, accurate inversion of hydrogeological parameters and description of hydrogeological processes.
杜尚海, 古成科, 张文静. 随机森林理论及其在水文地质领域的研究进展[J]. 中国环境科学, 2022, 42(9): 4285-4295.
DU Shang-hai, GU Cheng-ke, ZHANG Wen-jing. A review on the progresses in random forests theory and its applications in hydrogeology. CHINA ENVIRONMENTAL SCIENCECE, 2022, 42(9): 4285-4295.
Tyralis H, Papacharalampous G, Langousis A. A brief review of random forests for water scientists and practitioners and their recent history in water resources [J]. Water, 2019,11(5):910.
[2]
Breiman L. Random forests [J]. Machine Learning, 2001,45(1):5-32.
[3]
王奕森,夏树涛.集成学习之随机森林算法综述 [J]. 信息通信技术, 2018,12(1):7:49-55. Wang Y S, Xia S T. A survey of random forests algorithms [J]. Information and Communications Technologies, 2018,12(1):7:49-55.
[4]
Bbeiman L. Bagging predictors [J]. Machine Learning, 1996,24(2): 123-140.
[5]
吕红燕,冯 倩.随机森林算法研究综述 [J]. 河北省科学院学报, 2019,36(3):37-41. Lv H Y, Feng Q. A survey of random forests algorithm [J]. Journal of the Hebei Academy of Sciences, 2019,36(3):37-41.
[6]
Robert I. Kabacoff. R语言实战(第2版) [M]. 北京:人民邮电出版社, 2016:1011-1016. Robert I. Kabacoff. R in action(version 2) [M]. Beijing: Posts and Telecom Press, 2016:1011-1016.
[7]
董红瑶,王弈丹,李丽红.随机森林优化算法综述 [J]. 信息与电脑, 2021,33(17):34-37. Dong H Y, Wang Y D, Li L H. A review of random forest optimization algorithms [J]. China Computer and Communication, 2021,33(17):34- 37.
[8]
林 坜,雷晓东,杨 峰.地下水资源评价方法-水量均衡法的探讨 [J]. 北京水务, 2011,(2):41-44. Lin L, Lei X D, Yang F. Groundwater resources evaluation method- discussion on water balance method [J]. Beijing Water, 2011,(2):41- 44.
[9]
丁 楠.内蒙古察右翼前旗-集宁区地下水资源评价与开采潜力分析 [D]. 中国地质大学(北京), 2018. Ding N. Evalution of groudwater resources and analsysis of explotiation potential in Chahar Youyiqianqi-Dining district [D]. China University of Geosciences (Beijing), 2018.
[10]
Díaz-Alcaide S, Martínez-Santos P. Review: Advances in groundwater potential mapping [J]. Hydrogeol Journal, 2019,27(7): 2307-2324.
[11]
Chen L, He Q, Liu K, et al. Downscaling of GRACE-derived groundwater storage based on the random forest model [J]. Remote Sensing, 2019,11(24):2979.
[12]
Chen W, Li Y, Tsangaratos P, et al. Groundwater spring potential mapping using artificial intelligence approach based on kernel logistic regression, random forest, and alternating decision tree models [J]. Applied Sciences, 2020,10(2):425.
[13]
Naghibi S A, Pourghasemi H R, Dixon B. GIS-based groundwater potential mapping using boosted regression tree,classification and regression tree, and random forest machine learning models in iran [J]. Environmental Monitoring and Assessment, 2016,188(1):44.
[14]
Naghibi S A, Dolatkordestani M, Rezaei A. Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential [J]. Environmental Monitoring and Assessment Volume, 2019,191(4):248.
[15]
Naghibi S A, Hashemi H, Berndtsson R, et al. Application of extreme gradient boosting and parallel random forest algorithms for assessing groundwater spring potential using DEM-derived factors [J]. Journal of Hydrology, 2020,589(1):125-197.
[16]
Sameen M I, Pradhan B, Lee S. Self-learning random forests model for mapping groundwater yield in data-scarce areas [J]. Natural Resources Research, 2018,28(3):757-775.
[17]
Miraki S, Zanganeh S H, Chapi K, et al. Mapping groundwater potential using a novel hybrid intelligence approach [J]. Water Resources Management, 2019,33(1):281-302.
[18]
韩 玉,卢文喜,李峰平,等.浑河流域地表水地下水水质耦合模拟 [J]. 中国环境科学, 2020,40(4):1677-1686. Han Y, Lu W X, Li F P, et al. Water quality coupling simulation of surface water and groundwater in Hunhe river basin [J]. China Environmental Science, 2020,40(4):1677-1686.
[19]
殷禹宇,胡友彪,刘启蒙,等.地表水与地下水相互作用研究进展 [J]. 绿色科技, 2016,(4):50-52. Yin Y Y, Hu Y B, Liu Q M, et al. Review on research progress of interaction between surface water and groundwater [J]. Journal of Green Science and Technology, 2016,(4):50-52.
[20]
Hatch C E, Fisher A T, Revenaugh J S, et al. Quantifying surface water - groundwater interactions using time series analysis of streambed thermal records: method development [J]. Water Resources Research, 2006,42(10).
[21]
Zhou Z W, Zhou Z F, Xu H Y, et al. Surface water-groundwater interactions of xiluodu reservoir based on the dynamic evolution of seepage, temperature, and hydrochemistry due to impoundment [J]. Hydrological Processes, 2021,35(8).
[22]
Kong F, Song J, Zhang Y, et al. Surface water-groundwater interaction in the guanzhong section of the Weihe River Basin, China [J]. Ground Water, 2018,57(4):647-660.
[23]
Yang J, Mcmillan H, Zammit C. Modeling surface water-groundwater interaction in New Zealand: Model development and application [J]. Hydrological Processes, 2017,31(4):925-934.
[24]
Haque A, Salama A, Lo K, et al. Surface and groundwater interactions: A review of coupling strategies in detailed domain models [J]. Hydrology, 2021,8(1):35.
[25]
朱金峰,刘悦忆,章树安,等.地表水与地下水相互作用研究进展 [J]. 中国环境科学, 2017,37(8):3002-3010. Zhu J F, Liu Y Y, Zhang S A, et al. Review on the research of surface water and groundwater interactions [J]. China Environmental Science, 2017,37(8):3002-3010.
[26]
Stahl M O, Gehring J, Jameel Y. Isotopic variation in groundwater across the conterminous United States - Insight into hydrologic processes [J]. Hydrological Processes, 2020,34(16):3506-3523.
[27]
Yang J, Griffiths J, Zammit C .National classification of surface- groundwater interaction using random forest machine learning technique [J]. River Research and Applications, 2019,35(7):932-943.
[28]
Koch J, Berger H, Henriksen H J, et al. Modelling of the shallow water table at high spatial resolution using random forests [J]. Hydrology and Earth System Sciences Discussions, 2019,23(11):1-26.
[29]
杨 光,粟晓玲.基于随机森林的黑河中游地下水埋深变化及成因 [J]. 水土保持研究, 2017,24(1):109-114. Yang G, Su X L. Change of groundwater depth Heihe river basin and its causes in middle dtream of the based on the random forest [J]. Research of Soil and Water Conservation, 2017,24(1):109-114.
[30]
Rong M, Shi J, Zhang Y, et al. Variation of hydraulic conductivity with depth in the North China plain [J]. Arabian Journal of Geosciences, 2016,9(10):1-13.
[31]
高瑞忠,秦子元,张 生,等.吉兰泰盐湖盆地地下水Cr6+,As,Hg健康风险评价 [J]. 中国环境科学, 2018,38(6):2353-2362. Gao R Z, Qin Z Y, Zhang S, et al. Health risk assessment of Cr6+, As and Hg in groundwater of Jilantai salt lake basin [J]. China Environmental Science, 2018,38(6):2353-2362.
[32]
姜海涛.黑龙江省林口县莲花新镇地下水资源评价 [D]. 长春:吉林大学, 2014. Jiang H T. The evaluation of groundwater resources in Lianhuaxin Town Linkou Ctiy Heilongjiang Province [D]. Changchun: Jilin University, 2014.
[33]
Zhang F C, Wu B, Gao F, et al. Hydrochemical characteristics of groundwater and evaluation of water quality in arid area of northwest China: A case study in the plain area of Kuitun River Basin [J]. Arabian Journal of Geosciences, 2021,14(20):1-19.
[34]
Chai Y, Xiao C, Li M, et al. Hydrogeochemical characteristics and groundwater quality evaluation based on multivariate statistical analysis [J]. Water, 2020,12(10):2792.
[35]
吴 敏,温小虎,冯 起,等.基于随机森林模型的干旱绿洲区张掖盆地地下水水质评价 [J]. 中国沙漠, 2018,38(3):657-663. Wu M, Wen X H, Feng Q, et al. Assesssment of groundwater quality based on random forest model in arid oasis area [J]. Journal of Desert Research, 2018,38(3):657-663.
[36]
王 雪.基于随机森林算法的唐山市水质评价 [J]. 水利技术监督, 2018,(5):173-176. Wang X. Evaluation of water quality of Tangshan city based on random forest algorithm [J]. Technical Supervision in Water Resources, 2018,(5):173-176.
[37]
闫佰忠,孙 剑,安 娜.基于随机森林模型的地下水水质评价方法 [J]. 水电能源科学, 2019,37(11):66-69. Yan B Z, Sun J, An N. Assessment of groundwater quality based on random forest model [J]. Water Resources and Power, 2019,37(11):66- 69.
[38]
Wu C, Fang C, Wu X, et al. Health-risk assessment of arsenic and groundwater quality classification using random forest in the yanchi region of northwest China [J]. Exposure and Health, 2019,(5):761- 774.
[39]
Jeihouni M, Toomanian A, Mansourian A. Decision tree-based data mining and rule induction for identifying high quality groundwater zones to water supply management: A novel hybrid use of data mining and GIS [J]. Water Resources Management, 2020,34(1):139-154.
[40]
Norouzi H, Moghaddam A A. Groundwater quality assessment using random forest method based on groundwater quality indices (case study: Miandoab plain aquifer, NW of Iran [J]. Arabian Journal of Geosciences, 2020,13(18):1-13.
[41]
Baudron P, Alonso-Sarria F, Garcia-Arostegui, et al. Identifying the origin of groundwater samples in a multi-layer aquifer system with random forest classification [J]. Journal of Hydrology, 2013,499:303- 315.
[42]
吴娟娟,卞建民,万罕立,等.松嫩平原地下水氮污染健康风险评估 [J]. 中国环境科学, 2019,39(8):3493-3500. Wu J J, Pian J M, Wan H L, et al. Health risk assessment of groundwater nitrogen pollution in Songnen Plain [J]. China Environmental Science, 2019,39(8):3493-3500.
[43]
周巾枚,蒋忠诚,徐光黎,等.铁矿周边地下水金属元素分布及健康风险评价 [J]. 中国环境科学, 2019,39(5):1934-1944. Zhou J M, Jiang Z C, Xv G L, et al. Distribution and health risk assessment of metals in groundwater around iron mine [J]. China Environmental Science, 2019,39(5):1934-1944.
[44]
邓安琪,董兆敏,高 群,等.中国地下水砷健康风险评价 [J]. 中国环境科学, 2017,37(9):3556-3565. Deng A Q, Dong Y M, Gao Q, et al. Health risk assessment of arsenic in groundwater across China [J]. China Environmental Science, 2017, 37(9):3556-3565.
[45]
Rodriguez-Galiano V, Mendes M P, Garcia-Soldado M J, et al. Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain) [J]. Science of the Total Environment, 2014,476:189-206.
[46]
Tesoriero A J, Gronberg J A, Juckem P F, et al. Predicting redox‐sensitive contaminant concentrations in groundwater using random forest classification [J]. Water Resources Research, 2017,53(8):7316- 7331.
[47]
Nolan B T, Gronberg J A M, Faunt C C, et al. Modeling nitrate at domestic and public-supply well depths in the central Valley, California [J]. Environmental Science and Technology, 2014,48(10): 5643-51.
[48]
Nafouanti M B, Li J X, Mustapha N A, et al.Prediction on the fluoride contamination in groundwater at the datong basin, northern china: comparison of random forest, logistic regression and artificial neural network [J]. Applied Geochemistry, 2021,132.
[49]
Podgorski J E, Labhasetwar P, Saha D, et al. Prediction modeling and mapping of groundwater fluoride contamination throughout India [J]. Environmental Science Technology, 2018,52(17):9889-9898.
[50]
付 宇,曹文庚,张娟娟.基于随机森林建模预测河套盆地高砷地下水风险分布 [J]. 岩矿测试, 2021,40(6):860-870. Fu Y, Cao W G, Zhang J J. High Arsenic Risk Distution Prediction of Groundwater in the Hetao Basin by Random Forest Modeling [J]. Rock and Mineral Analysis, 2021,40(6):860-870.
[51]
Podgorski J, Berg M. Global threat of arsenic in groundwater [J]. Science, 2020,368(6493):845-850.
[52]
李 冲.随机森林模型预测岩溶区酸性煤矿井水锰污染 [J]. 中国煤炭地质, 2021,33(3):43-47,59. Li C. Prediction of karst region Acidic coalmine water manganese pollution based on random forest [J]. Coal Geology of China, 2021, 33(3):43-47,59.
[53]
Canion A, Mccloud L, Dobberfuhl D. Predictive modeling of elevated groundwater nitrate in a karstic spring-contributing area using random forests and regression-kriging [J]. Environmental Earth Sciences, 2019,78(9).
[54]
Bindal S, Singh C K. Predicting groundwater arsenic contamination: Regions at risk in highest populated state of India [J]. Water Research, 2019,159:65-76.
[55]
Friedel M J, Wilson S R, Close M E, et al. Comparison of four learning-based methods for predicting groundwater redox status [J]. Journal of Hydrolgy, 2020,580.
[56]
Pietrzak D. Modeling migration of organic pollutants in groundwater - review of available software [J]. Environmental Modelling and Software, 2021,144.
[57]
Speiser J L, Miller M E, Tooze J, et al. A comparison of random forest variable selection methods for classification prediction modeling [J]. Expert Systems with Application, 2019,134:93-101.
[58]
Blanchet L, Vitale R, Stavropoulos G, et al. Constructing bi-plots for Random Forest: tutorial [J]. Analytica Chimica Acta, 2020,1131:146- 155.
[59]
Biau G, Scornet E, Welbl, J. Neural random forests [J]. Sankhya-series A-mathematical Statistics and Probability, 2019,81(2):347-386.
[60]
Wang Y A, Xia S T, Tang Q T, et al. A novel consistent random forest framework: bernoulli random forests [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017,29(8):3510-3523.
[61]
Mantas C J, Castellano J G, Moral-García S, et al. A comparison of random forest based algorithms: random credal random forest versus oblique random forest [J]. Soft Computing, 2019,23(21):10739- 10754.