the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Implementing and evaluating various machine learning models for pipe burst prediction
Abstract. By accurate predicting of pipe bursts, it is possible to schedule pipe maintenance, rehabilitation and improve the level of services in water distribution networks (WDNs). In this study, we aimed to implement five artificial intelligence and machine learning regression models such as multivariate adaptive regression splines (MARS), M5' regression tree (M5'), Least square support vector regression (LS-SVR), fuzzy regression based on c-means clustering (FCMR) and regressive convolution neural network with support vector regression (RCNN-SVR) for predicting pipe burst rate and evaluating the performance of these models. The most effective parameters for regression models are pipes age, diameter, depth of installation, length, average and maximum hydraulic pressure. In the present study, collected data include 158 cases for polyethylene (PE) and 124 cases for asbestos cement (AC) pipes during 2012-2019. The results indicate that the RCNN-SVR model has a great performance of pipe burst rate (PBR) prediction.
- Preprint
(999 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on dwes-2021-7', Anonymous Referee #1, 26 Apr 2021
This paper presents an application of 5 existing statistical and machine learning methods for pipe burst prediction.
I suggest to reject this paper based of several major issues, such as:
1. There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.2. Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018). They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?
3. The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not. In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison? Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, ExtraTrees or Gradient Boosting trained using a single line of Python would provide at least the same results. Why did the authors use outdated benchmarks for comparison? Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.
4. The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.
5. The literature review is outdated and incomplete, see for instance the papers [2-4] reproted at the end of this review.
6. The paper is difficult to read even for an expert in the field.
References:
[1] Zhang, Youshan, and Qi Li. "A regressive convolution neural network and
support vector regression model for electricity consumption forecasting."
Future of Information and Communication Conference. Springer, Cham, 2019.[2] Konstantinou, Charalampos, and Ivan Stoianov. "A comparative study of
statistical and machine learning methods to infer causes of pipe breaks
in water supply networks." Urban Water Journal 17.6 (2020): 534-548.[3] Snider, Brett, and Edward A. McBean. "Improving urban water security through
pipe-break prediction models: Machine learning or survival analysis."
Journal of Environmental Engineering 146.3 (2020): 04019129.[4] Zhou, Xiao, et al. "Deep learning identifies accurate burst locations in water
distribution networks." Water research 166 (2019): 115058.Citation: https://doi.org/10.5194/dwes-2021-7-RC1 -
AC1: 'Reply on RC1', ahmad ravanbakhsh, 02 May 2021
1-There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.
Response:
The focus of this paper is to evaluate the performance of RCNN-SVR method on water distribution networks. During the studies conducted by the authors, the use of RCNN-SVR was observed in some engineering sciences, but the ability of this method in estimating the failure rate of water network pipes (in real case study) has not been measured in any scientific article. Therefore, the present study shows the high accuracy of this innovative method. Researchers can use this method as an efficient method in their researches. On the other hand, there are several articles in high-credited scientific journals that have compared outdated conventional machine learning methods. In this paper the new method that has much higher accuracy and efficiency is applied.
2-1 Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018).
Response:
Description of existing techniques makes some of the newest and most important machine learning methods available for readers. Comparing RCNN-SVR with these known methods has also been performed to demonstrate the strength of the RCNN-SVR method. However, if in the opinion of the honorable referee there is no need to explain the methods, these methods can be removed from the paper and only shown in result tables in future reviews. The referencing will be corrected in future review too.
2-2 They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?
Response:
Due to the limited number of words in the article, the full description of the RCNN-SVR method was omitted, which accords to the referee viewpoint, the description of other methods can be removed and the RCNN-SVR method will be explained in full detail.
3-1 The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not.
Response:
An outstanding feature of the RCNN-SVR method is the accurate learning with limited data because other machine learning methods require a big dataset for training. Due to spending 8 years to collect failure data of water pipes in the case study network, little information is available, so the RCNN-SVR method has been selected which provides a more accurate answer with low data. Obviously, the RCNN-SVR method is more efficient if there is a lot of input data.
3-2 In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison?
Response:
The training data is not split to create a separate validation dataset. The test set is used for validation. 85% of data have been selected for training and the rest of them have been used to test the models. All the machine learning methods that described in this paper have been compared in the same way and a fair comparison has been made. Also the same has been done in other scientific articles [1], [2].
3-3 Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, Extra Trees or Gradient Boosting trained using a single line of Python would provide at least the same results.
Response:
Although it is possible to use a single line of Python for tree-based algorithms but the results are not as accurate as RCNN-SVR. In this paper (line 86), a tree-based algorithms (m5') is used which has a more inaccurate answer than the RCNN-SVR method. The provided codes can also be cited.
3-4 Why did the authors use outdated benchmarks for comparison?
Response:
Criteria used to compare machine learning methods are commonly used in many new scientific papers and can be cited. However, the use of other criteria can be used according to the suggestion of the honorable referee.
3-5 Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.
Response:
RCNN-SVR model development has been done in other engineering sciences but its use in water networks is innovative. The purpose of this article is to implement and compare some machine learning methods and developing of models was not considered. In this paper, an innovative new method is used to provide accurate results with low dataset which has not been used to predict the failure rate of pipes in any researches.
4- The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.
Response:
The structure will be correct in next revision.
5- The literature review is outdated and incomplete, see for instance the papers [2-4] reported at the end of this review.
Response:
Suggested articles will use in next revision. Thanks for introducing useful articles.
6- The paper is difficult to read even for an expert in the field.
Response:
Due to the difficulty of the content, an attempt has been made to express the content in the most appropriate way. However, it will be expressed more simply in the next revision.
[1] Shirzad, Akbar, and Mir Jafar Sadegh Safari. "Pipe failure rate prediction in water distribution networks using multivariate adaptive regression splines and random forest techniques." Urban Water Journal 16.9 (2019): 653-661.
[2] Shirzad, Akbar, Massoud Tabesh, and Raziyeh Farmani. "A comparison between performance of support vector regression and artificial neural network in prediction of pipe burst rate in water distribution networks." KSCE Journal of Civil Engineering 18.4 (2014): 941-948.
Citation: https://doi.org/10.5194/dwes-2021-7-AC1
-
AC1: 'Reply on RC1', ahmad ravanbakhsh, 02 May 2021
-
CC1: 'Comment on dwes-2021-7', Rana Muhammad Adnan Ikram, 02 May 2021
manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation? For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.
Citation: https://doi.org/10.5194/dwes-2021-7-CC1 -
AC3: 'Reply on CC1', ahmad ravanbakhsh, 11 May 2021
Manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.
Response:
In this article, 5 artificial intelligence models among the prominent methods of machine learning have been examined and their performance has been compared with each other. It's not clear for authors, why did the esteemed reader separate the two models from the 5 models? While all 5 models are compared simultaneously. Also please specify where more details are needed.
Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation?
Response:
There are many machine learning methods in published scientific articles. Every researcher is interested in a number of them and conducts their research based on them. Although MARS was used in this article, 5 new artificial intelligence methods have been used that RCNN-SVR has not used before. It's not possible to implement many artificial intelligence methods in an article. As you know, researchers compare just two or three methods in their study. The mentioned methods by esteemed reader can be used in future articles.
For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.
Response:
Due to the obsolescence of the studied water network and the lack of pipe failure statistics in recent years, the number of our input data has been limited and the main challenge facing the authors in this article is to find a suitable machine learning model that can provide accurate answers with low data. Now, this issue can be added to the article in the next revision if the referees and the editor agree.
Citation: https://doi.org/10.5194/dwes-2021-7-AC3
-
AC3: 'Reply on CC1', ahmad ravanbakhsh, 11 May 2021
-
RC2: 'Comment on dwes-2021-7', Martijn Bakker, 04 May 2021
Review: Implementing and evaluating various machine learning models for pipe burst prediction
L7 "accurate" should be "accurately"
L25 "this" should be "these"
L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?
L53. What is random sampling?
L240. What means inverse relation with length? Because 1/length is also in the PBR formula
L251. "compare" must be "compares"
L270. I miss the discussion. Why is RCNNSVR that much better than all other methods. What unique features make this method outperform all other methods by far. Are the other methods that much simpler? And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?
Citation: https://doi.org/10.5194/dwes-2021-7-RC2 -
AC2: 'Reply on RC2', ahmad ravanbakhsh, 08 May 2021
1- L7 "accurate" should be "accurately"
Response:
Will be corrected in next revision
2- L25 "this" should be "these"
Response:
Will be corrected in next revision
3- L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?
Response:
PBR is only calculated through failed pipes, not all pipes, so there are not many zeros, and we only have PBR for the number of pipes which has bursts.
For example, consider the length of a pipe made of asbestos cement is 131.18 meters and this pipe has had 1 failure during 8 years of study, so its PBR is equal to:
PBR= A/B
A= 1/8= 0.125 = Annual burst rate
B= Length to kilometers = 131.18/1000 = 0.13118 Km
So PBR=0.9528
4- L53. What is random sampling?
Response:
Suppose the total number of failure cases is 100, now 85 of them are provided to artificial intelligence for learning, and the remaining 15 are used to check the accuracy of the output given by the machine to predict the failure rate of the pipes. The choice of 85 out of 100 is completely random.
5- L240. What means inverse relation with length? Because 1/length is also in the PBR formula
Response:
Because in the PBR formula, the length of the pipe is the denominator. It is inversely related to the PBR, and because the length of the pipe is much more than the number of failure statistics, there is a large negative correlation between the length of the pipe and the PBR. According to Formula 1, the inverse relationship between length and PBR is clear, but the reason for mentioning it is the emphasis on the negative correlation between PBR and pipe length. Now, if this issue does not need to be mentioned in the opinion of the honorable referee, this sentence can be deleted in the next edition.
6- L251. "compare" must be "compares"
Response:
Will be corrected in next revision
7-1 L270. I miss the discussion. Why is RCNNSVR that much better than all other methods.
Response:
Because all models were measured with the evaluation criteria in the 2.2 section (Model performance assessment (L170)) and according to the results in Table 2, the RCNN-SVR method has the lowest error and the highest accuracy among the studied models.
For example, the RMSE criterion for PE pipes for MARS = 0.37 M5 '= 0.3 FCR = 0.38 LSSVR = 0.35 while for RCNN-SVR = 0.052 is obtained, which shows the high accuracy of this method for predicting of PBR. Figure 6 shows the high accommodation between the numbers obtained for PBR and the actual values taken from the real network of Jopar city in RCNN-SVR method.
7-2 What unique features make this method outperform all other methods by far.
Response:
The unique feature of this method is the high accuracy of PBR prediction compared to other methods.
7-3 Are the other methods that much simpler?
Response:
This paper does not examine the simplicity or complexity of the methods and only attempts to implement and evaluate the performance of a number of outstanding machine learning models in PBR prediction on a real water distribution network. The novelty of this paper is the use of RCNN-SVR method in water networks. Although this method has been used in other engineering sciences, but according to the research done by the authors, the use of this method in predicting PBR on a real network has not existed in any of the published articles. By examining this method and presenting its brilliant results, an efficient method has been introduced to the readers of this article.
7-4 And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?
Response:
Yes, attempts have been made to adjust the parameters in all models so that the best possible answer of each model was obtained. Each of the models was evaluated with the same evaluation criteria under the same conditions and it did not matter to the authors which method was better, but after the implementing and evaluating all models on real case study, it was found that the RCNN-SVR method was clearly better than the other methods.
Citation: https://doi.org/10.5194/dwes-2021-7-AC2
-
AC2: 'Reply on RC2', ahmad ravanbakhsh, 08 May 2021
Status: closed
-
RC1: 'Comment on dwes-2021-7', Anonymous Referee #1, 26 Apr 2021
This paper presents an application of 5 existing statistical and machine learning methods for pipe burst prediction.
I suggest to reject this paper based of several major issues, such as:
1. There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.2. Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018). They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?
3. The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not. In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison? Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, ExtraTrees or Gradient Boosting trained using a single line of Python would provide at least the same results. Why did the authors use outdated benchmarks for comparison? Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.
4. The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.
5. The literature review is outdated and incomplete, see for instance the papers [2-4] reproted at the end of this review.
6. The paper is difficult to read even for an expert in the field.
References:
[1] Zhang, Youshan, and Qi Li. "A regressive convolution neural network and
support vector regression model for electricity consumption forecasting."
Future of Information and Communication Conference. Springer, Cham, 2019.[2] Konstantinou, Charalampos, and Ivan Stoianov. "A comparative study of
statistical and machine learning methods to infer causes of pipe breaks
in water supply networks." Urban Water Journal 17.6 (2020): 534-548.[3] Snider, Brett, and Edward A. McBean. "Improving urban water security through
pipe-break prediction models: Machine learning or survival analysis."
Journal of Environmental Engineering 146.3 (2020): 04019129.[4] Zhou, Xiao, et al. "Deep learning identifies accurate burst locations in water
distribution networks." Water research 166 (2019): 115058.Citation: https://doi.org/10.5194/dwes-2021-7-RC1 -
AC1: 'Reply on RC1', ahmad ravanbakhsh, 02 May 2021
1-There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.
Response:
The focus of this paper is to evaluate the performance of RCNN-SVR method on water distribution networks. During the studies conducted by the authors, the use of RCNN-SVR was observed in some engineering sciences, but the ability of this method in estimating the failure rate of water network pipes (in real case study) has not been measured in any scientific article. Therefore, the present study shows the high accuracy of this innovative method. Researchers can use this method as an efficient method in their researches. On the other hand, there are several articles in high-credited scientific journals that have compared outdated conventional machine learning methods. In this paper the new method that has much higher accuracy and efficiency is applied.
2-1 Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018).
Response:
Description of existing techniques makes some of the newest and most important machine learning methods available for readers. Comparing RCNN-SVR with these known methods has also been performed to demonstrate the strength of the RCNN-SVR method. However, if in the opinion of the honorable referee there is no need to explain the methods, these methods can be removed from the paper and only shown in result tables in future reviews. The referencing will be corrected in future review too.
2-2 They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?
Response:
Due to the limited number of words in the article, the full description of the RCNN-SVR method was omitted, which accords to the referee viewpoint, the description of other methods can be removed and the RCNN-SVR method will be explained in full detail.
3-1 The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not.
Response:
An outstanding feature of the RCNN-SVR method is the accurate learning with limited data because other machine learning methods require a big dataset for training. Due to spending 8 years to collect failure data of water pipes in the case study network, little information is available, so the RCNN-SVR method has been selected which provides a more accurate answer with low data. Obviously, the RCNN-SVR method is more efficient if there is a lot of input data.
3-2 In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison?
Response:
The training data is not split to create a separate validation dataset. The test set is used for validation. 85% of data have been selected for training and the rest of them have been used to test the models. All the machine learning methods that described in this paper have been compared in the same way and a fair comparison has been made. Also the same has been done in other scientific articles [1], [2].
3-3 Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, Extra Trees or Gradient Boosting trained using a single line of Python would provide at least the same results.
Response:
Although it is possible to use a single line of Python for tree-based algorithms but the results are not as accurate as RCNN-SVR. In this paper (line 86), a tree-based algorithms (m5') is used which has a more inaccurate answer than the RCNN-SVR method. The provided codes can also be cited.
3-4 Why did the authors use outdated benchmarks for comparison?
Response:
Criteria used to compare machine learning methods are commonly used in many new scientific papers and can be cited. However, the use of other criteria can be used according to the suggestion of the honorable referee.
3-5 Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.
Response:
RCNN-SVR model development has been done in other engineering sciences but its use in water networks is innovative. The purpose of this article is to implement and compare some machine learning methods and developing of models was not considered. In this paper, an innovative new method is used to provide accurate results with low dataset which has not been used to predict the failure rate of pipes in any researches.
4- The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.
Response:
The structure will be correct in next revision.
5- The literature review is outdated and incomplete, see for instance the papers [2-4] reported at the end of this review.
Response:
Suggested articles will use in next revision. Thanks for introducing useful articles.
6- The paper is difficult to read even for an expert in the field.
Response:
Due to the difficulty of the content, an attempt has been made to express the content in the most appropriate way. However, it will be expressed more simply in the next revision.
[1] Shirzad, Akbar, and Mir Jafar Sadegh Safari. "Pipe failure rate prediction in water distribution networks using multivariate adaptive regression splines and random forest techniques." Urban Water Journal 16.9 (2019): 653-661.
[2] Shirzad, Akbar, Massoud Tabesh, and Raziyeh Farmani. "A comparison between performance of support vector regression and artificial neural network in prediction of pipe burst rate in water distribution networks." KSCE Journal of Civil Engineering 18.4 (2014): 941-948.
Citation: https://doi.org/10.5194/dwes-2021-7-AC1
-
AC1: 'Reply on RC1', ahmad ravanbakhsh, 02 May 2021
-
CC1: 'Comment on dwes-2021-7', Rana Muhammad Adnan Ikram, 02 May 2021
manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation? For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.
Citation: https://doi.org/10.5194/dwes-2021-7-CC1 -
AC3: 'Reply on CC1', ahmad ravanbakhsh, 11 May 2021
Manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.
Response:
In this article, 5 artificial intelligence models among the prominent methods of machine learning have been examined and their performance has been compared with each other. It's not clear for authors, why did the esteemed reader separate the two models from the 5 models? While all 5 models are compared simultaneously. Also please specify where more details are needed.
Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation?
Response:
There are many machine learning methods in published scientific articles. Every researcher is interested in a number of them and conducts their research based on them. Although MARS was used in this article, 5 new artificial intelligence methods have been used that RCNN-SVR has not used before. It's not possible to implement many artificial intelligence methods in an article. As you know, researchers compare just two or three methods in their study. The mentioned methods by esteemed reader can be used in future articles.
For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.
Response:
Due to the obsolescence of the studied water network and the lack of pipe failure statistics in recent years, the number of our input data has been limited and the main challenge facing the authors in this article is to find a suitable machine learning model that can provide accurate answers with low data. Now, this issue can be added to the article in the next revision if the referees and the editor agree.
Citation: https://doi.org/10.5194/dwes-2021-7-AC3
-
AC3: 'Reply on CC1', ahmad ravanbakhsh, 11 May 2021
-
RC2: 'Comment on dwes-2021-7', Martijn Bakker, 04 May 2021
Review: Implementing and evaluating various machine learning models for pipe burst prediction
L7 "accurate" should be "accurately"
L25 "this" should be "these"
L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?
L53. What is random sampling?
L240. What means inverse relation with length? Because 1/length is also in the PBR formula
L251. "compare" must be "compares"
L270. I miss the discussion. Why is RCNNSVR that much better than all other methods. What unique features make this method outperform all other methods by far. Are the other methods that much simpler? And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?
Citation: https://doi.org/10.5194/dwes-2021-7-RC2 -
AC2: 'Reply on RC2', ahmad ravanbakhsh, 08 May 2021
1- L7 "accurate" should be "accurately"
Response:
Will be corrected in next revision
2- L25 "this" should be "these"
Response:
Will be corrected in next revision
3- L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?
Response:
PBR is only calculated through failed pipes, not all pipes, so there are not many zeros, and we only have PBR for the number of pipes which has bursts.
For example, consider the length of a pipe made of asbestos cement is 131.18 meters and this pipe has had 1 failure during 8 years of study, so its PBR is equal to:
PBR= A/B
A= 1/8= 0.125 = Annual burst rate
B= Length to kilometers = 131.18/1000 = 0.13118 Km
So PBR=0.9528
4- L53. What is random sampling?
Response:
Suppose the total number of failure cases is 100, now 85 of them are provided to artificial intelligence for learning, and the remaining 15 are used to check the accuracy of the output given by the machine to predict the failure rate of the pipes. The choice of 85 out of 100 is completely random.
5- L240. What means inverse relation with length? Because 1/length is also in the PBR formula
Response:
Because in the PBR formula, the length of the pipe is the denominator. It is inversely related to the PBR, and because the length of the pipe is much more than the number of failure statistics, there is a large negative correlation between the length of the pipe and the PBR. According to Formula 1, the inverse relationship between length and PBR is clear, but the reason for mentioning it is the emphasis on the negative correlation between PBR and pipe length. Now, if this issue does not need to be mentioned in the opinion of the honorable referee, this sentence can be deleted in the next edition.
6- L251. "compare" must be "compares"
Response:
Will be corrected in next revision
7-1 L270. I miss the discussion. Why is RCNNSVR that much better than all other methods.
Response:
Because all models were measured with the evaluation criteria in the 2.2 section (Model performance assessment (L170)) and according to the results in Table 2, the RCNN-SVR method has the lowest error and the highest accuracy among the studied models.
For example, the RMSE criterion for PE pipes for MARS = 0.37 M5 '= 0.3 FCR = 0.38 LSSVR = 0.35 while for RCNN-SVR = 0.052 is obtained, which shows the high accuracy of this method for predicting of PBR. Figure 6 shows the high accommodation between the numbers obtained for PBR and the actual values taken from the real network of Jopar city in RCNN-SVR method.
7-2 What unique features make this method outperform all other methods by far.
Response:
The unique feature of this method is the high accuracy of PBR prediction compared to other methods.
7-3 Are the other methods that much simpler?
Response:
This paper does not examine the simplicity or complexity of the methods and only attempts to implement and evaluate the performance of a number of outstanding machine learning models in PBR prediction on a real water distribution network. The novelty of this paper is the use of RCNN-SVR method in water networks. Although this method has been used in other engineering sciences, but according to the research done by the authors, the use of this method in predicting PBR on a real network has not existed in any of the published articles. By examining this method and presenting its brilliant results, an efficient method has been introduced to the readers of this article.
7-4 And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?
Response:
Yes, attempts have been made to adjust the parameters in all models so that the best possible answer of each model was obtained. Each of the models was evaluated with the same evaluation criteria under the same conditions and it did not matter to the authors which method was better, but after the implementing and evaluating all models on real case study, it was found that the RCNN-SVR method was clearly better than the other methods.
Citation: https://doi.org/10.5194/dwes-2021-7-AC2
-
AC2: 'Reply on RC2', ahmad ravanbakhsh, 08 May 2021
Data sets
joopar Ac and PE pipes Ravanbakhsh, Ahmad https://doi.org/10.5281/zenodo.4587385
Model code and software
Regression matlab codes Ravanbakhsh, Ahmad https://doi.org/10.5281/zenodo.4587392
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
739 | 459 | 57 | 1,255 | 34 | 35 |
- HTML: 739
- PDF: 459
- XML: 57
- Total: 1,255
- BibTeX: 34
- EndNote: 35
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1