Implementing and evaluating various machine learning models for pipe burst prediction

Ravanbakhsh, Ahmad; Momeni, Mehdi; Robati, Amir

doi:https://doi.org/10.5194/dwes-2021-7

Preprints

https://doi.org/10.5194/dwes-2021-7

Preprints

29 Mar 2021

| 29 Mar 2021

Status: this discussion paper is a preprint. It has been under review for the journal Drinking Water Engineering and Science (DWES). The manuscript was not accepted for further review after discussion.

Implementing and evaluating various machine learning models for pipe burst prediction

Ahmad Ravanbakhsh, Mehdi Momeni, and Amir Robati

Abstract. By accurate predicting of pipe bursts, it is possible to schedule pipe maintenance, rehabilitation and improve the level of services in water distribution networks (WDNs). In this study, we aimed to implement five artificial intelligence and machine learning regression models such as multivariate adaptive regression splines (MARS), M5' regression tree (M5'), Least square support vector regression (LS-SVR), fuzzy regression based on c-means clustering (FCMR) and regressive convolution neural network with support vector regression (RCNN-SVR) for predicting pipe burst rate and evaluating the performance of these models. The most effective parameters for regression models are pipes age, diameter, depth of installation, length, average and maximum hydraulic pressure. In the present study, collected data include 158 cases for polyethylene (PE) and 124 cases for asbestos cement (AC) pipes during 2012-2019. The results indicate that the RCNN-SVR model has a great performance of pipe burst rate (PBR) prediction.

Received: 06 Mar 2021 – Discussion started: 29 Mar 2021

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Ahmad Ravanbakhsh, Mehdi Momeni, and Amir Robati

Status: closed

RC1:
'Comment on dwes-2021-7', Anonymous Referee #1, 26 Apr 2021

This paper presents an application of 5 existing statistical and machine learning methods for pipe burst prediction.

I suggest to reject this paper based of several major issues, such as:

1. There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.

2. Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018). They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?

3. The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not. In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison? Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, ExtraTrees or Gradient Boosting trained using a single line of Python would provide at least the same results. Why did the authors use outdated benchmarks for comparison? Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.

4. The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.

5. The literature review is outdated and incomplete, see for instance the papers [2-4] reproted at the end of this review.

6. The paper is difficult to read even for an expert in the field.

References:

[1] Zhang, Youshan, and Qi Li. "A regressive convolution neural network and

support vector regression model for electricity consumption forecasting."

Future of Information and Communication Conference. Springer, Cham, 2019.

[2] Konstantinou, Charalampos, and Ivan Stoianov. "A comparative study of

statistical and machine learning methods to infer causes of pipe breaks

in water supply networks." Urban Water Journal 17.6 (2020): 534-548.

[3] Snider, Brett, and Edward A. McBean. "Improving urban water security through

pipe-break prediction models: Machine learning or survival analysis."

Journal of Environmental Engineering 146.3 (2020): 04019129.

[4] Zhou, Xiao, et al. "Deep learning identifies accurate burst locations in water

distribution networks." Water research 166 (2019): 115058.

Citation: https://doi.org/10.5194/dwes-2021-7-RC1
- AC1: 'Reply on RC1', ahmad ravanbakhsh, 02 May 2021
  
  1-There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.
  Response:
  The focus of this paper is to evaluate the performance of RCNN-SVR method on water distribution networks. During the studies conducted by the authors, the use of RCNN-SVR was observed in some engineering sciences, but the ability of this method in estimating the failure rate of water network pipes (in real case study) has not been measured in any scientific article. Therefore, the present study shows the high accuracy of this innovative method. Researchers can use this method as an efficient method in their researches. On the other hand, there are several articles in high-credited scientific journals that have compared outdated conventional machine learning methods. In this paper the new method that has much higher accuracy and efficiency is applied.
  2-1 Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018).
  Response:
  Description of existing techniques makes some of the newest and most important machine learning methods available for readers. Comparing RCNN-SVR with these known methods has also been performed to demonstrate the strength of the RCNN-SVR method. However, if in the opinion of the honorable referee there is no need to explain the methods, these methods can be removed from the paper and only shown in result tables in future reviews. The referencing will be corrected in future review too.
  2-2 They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?
  Response:
  Due to the limited number of words in the article, the full description of the RCNN-SVR method was omitted, which accords to the referee viewpoint, the description of other methods can be removed and the RCNN-SVR method will be explained in full detail.
  3-1 The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not.
  Response:
  An outstanding feature of the RCNN-SVR method is the accurate learning with limited data because other machine learning methods require a big dataset for training. Due to spending 8 years to collect failure data of water pipes in the case study network, little information is available, so the RCNN-SVR method has been selected which provides a more accurate answer with low data. Obviously, the RCNN-SVR method is more efficient if there is a lot of input data.
  3-2 In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison?
  Response:
  The training data is not split to create a separate validation dataset. The test set is used for validation. 85% of data have been selected for training and the rest of them have been used to test the models. All the machine learning methods that described in this paper have been compared in the same way and a fair comparison has been made. Also the same has been done in other scientific articles [1], [2].
  3-3 Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, Extra Trees or Gradient Boosting trained using a single line of Python would provide at least the same results.
  Response:
  Although it is possible to use a single line of Python for tree-based algorithms but the results are not as accurate as RCNN-SVR. In this paper (line 86), a tree-based algorithms (m5') is used which has a more inaccurate answer than the RCNN-SVR method. The provided codes can also be cited.
  3-4 Why did the authors use outdated benchmarks for comparison?
  Response:
  Criteria used to compare machine learning methods are commonly used in many new scientific papers and can be cited. However, the use of other criteria can be used according to the suggestion of the honorable referee.
  3-5 Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.
  Response:
  RCNN-SVR model development has been done in other engineering sciences but its use in water networks is innovative. The purpose of this article is to implement and compare some machine learning methods and developing of models was not considered. In this paper, an innovative new method is used to provide accurate results with low dataset which has not been used to predict the failure rate of pipes in any researches.
  4- The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.
  Response:
  The structure will be correct in next revision.
  5- The literature review is outdated and incomplete, see for instance the papers [2-4] reported at the end of this review.
  Response:
  Suggested articles will use in next revision. Thanks for introducing useful articles.
  6- The paper is difficult to read even for an expert in the field.
  Response:
  Due to the difficulty of the content, an attempt has been made to express the content in the most appropriate way. However, it will be expressed more simply in the next revision.
  [1] Shirzad, Akbar, and Mir Jafar Sadegh Safari. "Pipe failure rate prediction in water distribution networks using multivariate adaptive regression splines and random forest techniques." Urban Water Journal 16.9 (2019): 653-661.
  [2] Shirzad, Akbar, Massoud Tabesh, and Raziyeh Farmani. "A comparison between performance of support vector regression and artificial neural network in prediction of pipe burst rate in water distribution networks." KSCE Journal of Civil Engineering 18.4 (2014): 941-948.
  
  Citation: https://doi.org/10.5194/dwes-2021-7-AC1
CC1:
'Comment on dwes-2021-7', Rana Muhammad Adnan Ikram, 02 May 2021

manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation? For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.

Citation: https://doi.org/10.5194/dwes-2021-7-CC1
- AC3: 'Reply on CC1', ahmad ravanbakhsh, 11 May 2021
  
  Manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.
  Response:
  In this article, 5 artificial intelligence models among the prominent methods of machine learning have been examined and their performance has been compared with each other. It's not clear for authors, why did the esteemed reader separate the two models from the 5 models? While all 5 models are compared simultaneously. Also please specify where more details are needed.
  Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation?
  Response:
  There are many machine learning methods in published scientific articles. Every researcher is interested in a number of them and conducts their research based on them. Although MARS was used in this article, 5 new artificial intelligence methods have been used that RCNN-SVR has not used before. It's not possible to implement many artificial intelligence methods in an article. As you know, researchers compare just two or three methods in their study. The mentioned methods by esteemed reader can be used in future articles.
  
  For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.
  Response:
  Due to the obsolescence of the studied water network and the lack of pipe failure statistics in recent years, the number of our input data has been limited and the main challenge facing the authors in this article is to find a suitable machine learning model that can provide accurate answers with low data. Now, this issue can be added to the article in the next revision if the referees and the editor agree.
  
  Citation: https://doi.org/10.5194/dwes-2021-7-AC3
RC2:
'Comment on dwes-2021-7', Martijn Bakker, 04 May 2021

Review: Implementing and evaluating various machine learning models for pipe burst prediction

L7 "accurate" should be "accurately"

L25 "this" should be "these"

L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?

L53. What is random sampling?

L240. What means inverse relation with length? Because 1/length is also in the PBR formula

L251. "compare" must be "compares"

L270. I miss the discussion. Why is RCNNSVR that much better than all other methods. What unique features make this method outperform all other methods by far. Are the other methods that much simpler? And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?

Citation: https://doi.org/10.5194/dwes-2021-7-RC2
- AC2: 'Reply on RC2', ahmad ravanbakhsh, 08 May 2021
  
  1- L7 "accurate" should be "accurately"
  Response:
  Will be corrected in next revision
  
  2- L25 "this" should be "these"
  Response:
  Will be corrected in next revision
  
  3- L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?
  Response:
  PBR is only calculated through failed pipes, not all pipes, so there are not many zeros, and we only have PBR for the number of pipes which has bursts.
  For example, consider the length of a pipe made of asbestos cement is 131.18 meters and this pipe has had 1 failure during 8 years of study, so its PBR is equal to:
  PBR= A/B
  A= 1/8= 0.125 = Annual burst rate
  B= Length to kilometers = 131.18/1000 = 0.13118 Km
  So PBR=0.9528
  
  4- L53. What is random sampling?
  Response:
  Suppose the total number of failure cases is 100, now 85 of them are provided to artificial intelligence for learning, and the remaining 15 are used to check the accuracy of the output given by the machine to predict the failure rate of the pipes. The choice of 85 out of 100 is completely random.
  
  5- L240. What means inverse relation with length? Because 1/length is also in the PBR formula
  Response:
  Because in the PBR formula, the length of the pipe is the denominator. It is inversely related to the PBR, and because the length of the pipe is much more than the number of failure statistics, there is a large negative correlation between the length of the pipe and the PBR. According to Formula 1, the inverse relationship between length and PBR is clear, but the reason for mentioning it is the emphasis on the negative correlation between PBR and pipe length. Now, if this issue does not need to be mentioned in the opinion of the honorable referee, this sentence can be deleted in the next edition.
  
  6- L251. "compare" must be "compares"
  Response:
  Will be corrected in next revision
  
  7-1 L270. I miss the discussion. Why is RCNNSVR that much better than all other methods.
  Response:
  Because all models were measured with the evaluation criteria in the 2.2 section (Model performance assessment (L170)) and according to the results in Table 2, the RCNN-SVR method has the lowest error and the highest accuracy among the studied models.
  For example, the RMSE criterion for PE pipes for MARS = 0.37 M5 '= 0.3 FCR = 0.38 LSSVR = 0.35 while for RCNN-SVR = 0.052 is obtained, which shows the high accuracy of this method for predicting of PBR. Figure 6 shows the high accommodation between the numbers obtained for PBR and the actual values taken from the real network of Jopar city in RCNN-SVR method.
  
  7-2 What unique features make this method outperform all other methods by far.
  Response:
  The unique feature of this method is the high accuracy of PBR prediction compared to other methods.
  
  7-3 Are the other methods that much simpler?
  Response:
  This paper does not examine the simplicity or complexity of the methods and only attempts to implement and evaluate the performance of a number of outstanding machine learning models in PBR prediction on a real water distribution network. The novelty of this paper is the use of RCNN-SVR method in water networks. Although this method has been used in other engineering sciences, but according to the research done by the authors, the use of this method in predicting PBR on a real network has not existed in any of the published articles. By examining this method and presenting its brilliant results, an efficient method has been introduced to the readers of this article.
  
  7-4 And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?
  Response:
  Yes, attempts have been made to adjust the parameters in all models so that the best possible answer of each model was obtained. Each of the models was evaluated with the same evaluation criteria under the same conditions and it did not matter to the authors which method was better, but after the implementing and evaluating all models on real case study, it was found that the RCNN-SVR method was clearly better than the other methods.
  
  Citation: https://doi.org/10.5194/dwes-2021-7-AC2

Status: closed

RC1:
'Comment on dwes-2021-7', Anonymous Referee #1, 26 Apr 2021

This paper presents an application of 5 existing statistical and machine learning methods for pipe burst prediction.

I suggest to reject this paper based of several major issues, such as:

1. There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.

2. Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018). They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?

3. The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not. In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison? Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, ExtraTrees or Gradient Boosting trained using a single line of Python would provide at least the same results. Why did the authors use outdated benchmarks for comparison? Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.

4. The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.

5. The literature review is outdated and incomplete, see for instance the papers [2-4] reproted at the end of this review.

6. The paper is difficult to read even for an expert in the field.

References:

[1] Zhang, Youshan, and Qi Li. "A regressive convolution neural network and

support vector regression model for electricity consumption forecasting."

Future of Information and Communication Conference. Springer, Cham, 2019.

[2] Konstantinou, Charalampos, and Ivan Stoianov. "A comparative study of

statistical and machine learning methods to infer causes of pipe breaks

in water supply networks." Urban Water Journal 17.6 (2020): 534-548.

[3] Snider, Brett, and Edward A. McBean. "Improving urban water security through

pipe-break prediction models: Machine learning or survival analysis."

Journal of Environmental Engineering 146.3 (2020): 04019129.

[4] Zhou, Xiao, et al. "Deep learning identifies accurate burst locations in water

distribution networks." Water research 166 (2019): 115058.

Citation: https://doi.org/10.5194/dwes-2021-7-RC1
- AC1: 'Reply on RC1', ahmad ravanbakhsh, 02 May 2021
  
  1-There is very little novelty in this paper, which is essentially a poorly done machine learning exercise that would should be rather featured in a blog on Kaggle, rather than in a scientific journal; known techniques, different dataset, no advancing of the state-of-the-art.
  Response:
  The focus of this paper is to evaluate the performance of RCNN-SVR method on water distribution networks. During the studies conducted by the authors, the use of RCNN-SVR was observed in some engineering sciences, but the ability of this method in estimating the failure rate of water network pipes (in real case study) has not been measured in any scientific article. Therefore, the present study shows the high accuracy of this innovative method. Researchers can use this method as an efficient method in their researches. On the other hand, there are several articles in high-credited scientific journals that have compared outdated conventional machine learning methods. In this paper the new method that has much higher accuracy and efficiency is applied.
  2-1 Most of the paper is about the description of existing techniques, which have been described several times in the literature. The authors employ the RCNN-SVR technique but they fail to cite the correct paper [1] both in the references (Line 375, a paper from 2016) and in the text (Line 148, a paper from 2018).
  Response:
  Description of existing techniques makes some of the newest and most important machine learning methods available for readers. Comparing RCNN-SVR with these known methods has also been performed to demonstrate the strength of the RCNN-SVR method. However, if in the opinion of the honorable referee there is no need to explain the methods, these methods can be removed from the paper and only shown in result tables in future reviews. The referencing will be corrected in future review too.
  2-2 They lack to provide the required details to understand the model, e.g., are those 1D convolutional layers? How many trainable parameters total?
  Response:
  Due to the limited number of words in the article, the full description of the RCNN-SVR method was omitted, which accords to the referee viewpoint, the description of other methods can be removed and the RCNN-SVR method will be explained in full detail.
  3-1 The latter information is particularly important because Deep Learning models usually have hundreds of thousand of parameters, and the proposed dataset is made of barely 200 data points. Deep Learning is employed when BIG DATA is available, this is clearly not the case. This raises the issue of whether the models have been trained appropriately or not.
  Response:
  An outstanding feature of the RCNN-SVR method is the accurate learning with limited data because other machine learning methods require a big dataset for training. Due to spending 8 years to collect failure data of water pipes in the case study network, little information is available, so the RCNN-SVR method has been selected which provides a more accurate answer with low data. Obviously, the RCNN-SVR method is more efficient if there is a lot of input data.
  3-2 In line 52, the authors claim that 85% of data have been selected for training and the rest of them have been used to test the models. Was the training data further split to create a separate validation dataset? Or is the test set used for "validation" and model selection? Is this a fair comparison?
  Response:
  The training data is not split to create a separate validation dataset. The test set is used for validation. 85% of data have been selected for training and the rest of them have been used to test the models. All the machine learning methods that described in this paper have been compared in the same way and a fair comparison has been made. Also the same has been done in other scientific articles [1], [2].
  3-3 Given the data is tabular, not an image or a time-series, I am confident tree-based algorithms such as Random Forest, Extra Trees or Gradient Boosting trained using a single line of Python would provide at least the same results.
  Response:
  Although it is possible to use a single line of Python for tree-based algorithms but the results are not as accurate as RCNN-SVR. In this paper (line 86), a tree-based algorithms (m5') is used which has a more inaccurate answer than the RCNN-SVR method. The provided codes can also be cited.
  3-4 Why did the authors use outdated benchmarks for comparison?
  Response:
  Criteria used to compare machine learning methods are commonly used in many new scientific papers and can be cited. However, the use of other criteria can be used according to the suggestion of the honorable referee.
  3-5 Most importantly, the authors failed to provide the details needed to understand how they developed and compared the models.
  Response:
  RCNN-SVR model development has been done in other engineering sciences but its use in water networks is innovative. The purpose of this article is to implement and compare some machine learning methods and developing of models was not considered. In this paper, an innovative new method is used to provide accurate results with low dataset which has not been used to predict the failure rate of pipes in any researches.
  4- The paper lacks organization and structure. Experimental setup (e.g., train/test division) is presented in the Methodology, so are the final trained models. The dataset is cited in the Methodology but introduced in a subsequent section. The results and discussion section is minimal, and there is no discussion at all.
  Response:
  The structure will be correct in next revision.
  5- The literature review is outdated and incomplete, see for instance the papers [2-4] reported at the end of this review.
  Response:
  Suggested articles will use in next revision. Thanks for introducing useful articles.
  6- The paper is difficult to read even for an expert in the field.
  Response:
  Due to the difficulty of the content, an attempt has been made to express the content in the most appropriate way. However, it will be expressed more simply in the next revision.
  [1] Shirzad, Akbar, and Mir Jafar Sadegh Safari. "Pipe failure rate prediction in water distribution networks using multivariate adaptive regression splines and random forest techniques." Urban Water Journal 16.9 (2019): 653-661.
  [2] Shirzad, Akbar, Massoud Tabesh, and Raziyeh Farmani. "A comparison between performance of support vector regression and artificial neural network in prediction of pipe burst rate in water distribution networks." KSCE Journal of Civil Engineering 18.4 (2014): 941-948.
  
  Citation: https://doi.org/10.5194/dwes-2021-7-AC1
CC1:
'Comment on dwes-2021-7', Rana Muhammad Adnan Ikram, 02 May 2021

manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation? For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.

Citation: https://doi.org/10.5194/dwes-2021-7-CC1
- AC3: 'Reply on CC1', ahmad ravanbakhsh, 11 May 2021
  
  Manuscript in the present version contains several problems. Appropriate revisions should be undertaken in order to justify recommendation for publication. It is mentioned that LSSVM and M5TREE models are used. What are the advantages of adopting these particular methods over others in this case? How will this affect the results? More details should be furnished.
  Response:
  In this article, 5 artificial intelligence models among the prominent methods of machine learning have been examined and their performance has been compared with each other. It's not clear for authors, why did the esteemed reader separate the two models from the 5 models? While all 5 models are compared simultaneously. Also please specify where more details are needed.
  Why not tried MARS/OP-ELM/DENFIS/GMDH for comparison and validation?
  Response:
  There are many machine learning methods in published scientific articles. Every researcher is interested in a number of them and conducts their research based on them. Although MARS was used in this article, 5 new artificial intelligence methods have been used that RCNN-SVR has not used before. It's not possible to implement many artificial intelligence methods in an article. As you know, researchers compare just two or three methods in their study. The mentioned methods by esteemed reader can be used in future articles.
  
  For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.
  Response:
  Due to the obsolescence of the studied water network and the lack of pipe failure statistics in recent years, the number of our input data has been limited and the main challenge facing the authors in this article is to find a suitable machine learning model that can provide accurate answers with low data. Now, this issue can be added to the article in the next revision if the referees and the editor agree.
  
  Citation: https://doi.org/10.5194/dwes-2021-7-AC3
RC2:
'Comment on dwes-2021-7', Martijn Bakker, 04 May 2021

Review: Implementing and evaluating various machine learning models for pipe burst prediction

L7 "accurate" should be "accurately"

L25 "this" should be "these"

L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?

L53. What is random sampling?

L240. What means inverse relation with length? Because 1/length is also in the PBR formula

L251. "compare" must be "compares"

L270. I miss the discussion. Why is RCNNSVR that much better than all other methods. What unique features make this method outperform all other methods by far. Are the other methods that much simpler? And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?

Citation: https://doi.org/10.5194/dwes-2021-7-RC2
- AC2: 'Reply on RC2', ahmad ravanbakhsh, 08 May 2021
  
  1- L7 "accurate" should be "accurately"
  Response:
  Will be corrected in next revision
  
  2- L25 "this" should be "these"
  Response:
  Will be corrected in next revision
  
  3- L52. Is PBR determined of each individual pipe? Of subsets of pipes? Please explain. If it per pipe, then I would expect lots of 0 results (0 bursts / x length). How did you process that?
  Response:
  PBR is only calculated through failed pipes, not all pipes, so there are not many zeros, and we only have PBR for the number of pipes which has bursts.
  For example, consider the length of a pipe made of asbestos cement is 131.18 meters and this pipe has had 1 failure during 8 years of study, so its PBR is equal to:
  PBR= A/B
  A= 1/8= 0.125 = Annual burst rate
  B= Length to kilometers = 131.18/1000 = 0.13118 Km
  So PBR=0.9528
  
  4- L53. What is random sampling?
  Response:
  Suppose the total number of failure cases is 100, now 85 of them are provided to artificial intelligence for learning, and the remaining 15 are used to check the accuracy of the output given by the machine to predict the failure rate of the pipes. The choice of 85 out of 100 is completely random.
  
  5- L240. What means inverse relation with length? Because 1/length is also in the PBR formula
  Response:
  Because in the PBR formula, the length of the pipe is the denominator. It is inversely related to the PBR, and because the length of the pipe is much more than the number of failure statistics, there is a large negative correlation between the length of the pipe and the PBR. According to Formula 1, the inverse relationship between length and PBR is clear, but the reason for mentioning it is the emphasis on the negative correlation between PBR and pipe length. Now, if this issue does not need to be mentioned in the opinion of the honorable referee, this sentence can be deleted in the next edition.
  
  6- L251. "compare" must be "compares"
  Response:
  Will be corrected in next revision
  
  7-1 L270. I miss the discussion. Why is RCNNSVR that much better than all other methods.
  Response:
  Because all models were measured with the evaluation criteria in the 2.2 section (Model performance assessment (L170)) and according to the results in Table 2, the RCNN-SVR method has the lowest error and the highest accuracy among the studied models.
  For example, the RMSE criterion for PE pipes for MARS = 0.37 M5 '= 0.3 FCR = 0.38 LSSVR = 0.35 while for RCNN-SVR = 0.052 is obtained, which shows the high accuracy of this method for predicting of PBR. Figure 6 shows the high accommodation between the numbers obtained for PBR and the actual values taken from the real network of Jopar city in RCNN-SVR method.
  
  7-2 What unique features make this method outperform all other methods by far.
  Response:
  The unique feature of this method is the high accuracy of PBR prediction compared to other methods.
  
  7-3 Are the other methods that much simpler?
  Response:
  This paper does not examine the simplicity or complexity of the methods and only attempts to implement and evaluate the performance of a number of outstanding machine learning models in PBR prediction on a real water distribution network. The novelty of this paper is the use of RCNN-SVR method in water networks. Although this method has been used in other engineering sciences, but according to the research done by the authors, the use of this method in predicting PBR on a real network has not existed in any of the published articles. By examining this method and presenting its brilliant results, an efficient method has been introduced to the readers of this article.
  
  7-4 And did you put equal effort in all methods in parameter tuning? Or was your goal to show that RCNNSVR is a preferred method?
  Response:
  Yes, attempts have been made to adjust the parameters in all models so that the best possible answer of each model was obtained. Each of the models was evaluated with the same evaluation criteria under the same conditions and it did not matter to the authors which method was better, but after the implementing and evaluating all models on real case study, it was found that the RCNN-SVR method was clearly better than the other methods.
  
  Citation: https://doi.org/10.5194/dwes-2021-7-AC2

Ahmad Ravanbakhsh, Mehdi Momeni, and Amir Robati

Data sets

joopar Ac and PE pipes Ravanbakhsh, Ahmad https://doi.org/10.5281/zenodo.4587385

Model code and software

Regression matlab codes Ravanbakhsh, Ahmad https://doi.org/10.5281/zenodo.4587392

Ahmad Ravanbakhsh, Mehdi Momeni, and Amir Robati

Viewed

Total article views: 1,434 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
853	517	64	1,434	46	53

HTML: 853
PDF: 517
XML: 64
Total: 1,434
BibTeX: 46
EndNote: 53

Views and downloads (calculated since 29 Mar 2021)

Month	HTML	PDF	XML	Total
Mar 2021	32	9	2	43
Apr 2021	108	19	4	131
May 2021	175	30	12	217
Jun 2021	62	10	0	72
Jul 2021	23	8	0	31
Aug 2021	9	4	1	14
Sep 2021	6	19	1	26
Oct 2021	12	121	1	134
Nov 2021	17	42	1	60
Dec 2021	13	15	0	28
Jan 2022	15	15	1	31
Feb 2022	16	2	1	19
Mar 2022	13	11	0	24
Apr 2022	7	3	1	11
May 2022	6	6	1	13
Jun 2022	4	1	2	7
Jul 2022	6	2	0	8
Aug 2022	5	6	0	11
Sep 2022	8	7	0	15
Oct 2022	7	2	2	11
Nov 2022	4	2	1	7
Dec 2022	6	6	1	13
Jan 2023	12	8	0	20
Feb 2023	3	1	4
Mar 2023	7	3	10
Apr 2023	1	2	0	3
May 2023	1	1	2
Jun 2023	7	7	0	14
Jul 2023	9	9	0	18
Aug 2023	3	3	1	7
Sep 2023	9	12	3	24
Oct 2023	11	6	0	17
Nov 2023	4	3	0	7
Dec 2023	8	3	1	12
Jan 2024	10	6	0	16
Feb 2024	15	10	1	26
Mar 2024	7	9	1	17
Apr 2024	11	3	0	14
May 2024	7	3	0	10
Jun 2024	11	8	4	23
Jul 2024	11	2	3	16
Aug 2024	7	5	3	15
Sep 2024	9	5	1	15
Oct 2024	11	10	1	22
Nov 2024	8	3	1	12
Dec 2024	8	3	0	11
Jan 2025	15	8	0	23
Feb 2025	21	4	0	25
Mar 2025	15	8	4	27
Apr 2025	10	5	1	16
May 2025	20	5	2	27
Jun 2025	26	20	0	46
Jul 2025	2	7	0	9

Cumulative views and downloads (calculated since 29 Mar 2021)

Month	HTML	PDF	XML	Total
Mar 2021	32	9	2	43
Apr 2021	108	19	4	131
May 2021	175	30	12	217
Jun 2021	62	10	0	72
Jul 2021	23	8	0	31
Aug 2021	9	4	1	14
Sep 2021	6	19	1	26
Oct 2021	12	121	1	134
Nov 2021	17	42	1	60
Dec 2021	13	15	0	28
Jan 2022	15	15	1	31
Feb 2022	16	2	1	19
Mar 2022	13	11	0	24
Apr 2022	7	3	1	11
May 2022	6	6	1	13
Jun 2022	4	1	2	7
Jul 2022	6	2	0	8
Aug 2022	5	6	0	11
Sep 2022	8	7	0	15
Oct 2022	7	2	2	11
Nov 2022	4	2	1	7
Dec 2022	6	6	1	13
Jan 2023	12	8	0	20
Feb 2023	3	1	4
Mar 2023	7	3	10
Apr 2023	1	2	0	3
May 2023	1	1	2
Jun 2023	7	7	0	14
Jul 2023	9	9	0	18
Aug 2023	3	3	1	7
Sep 2023	9	12	3	24
Oct 2023	11	6	0	17
Nov 2023	4	3	0	7
Dec 2023	8	3	1	12
Jan 2024	10	6	0	16
Feb 2024	15	10	1	26
Mar 2024	7	9	1	17
Apr 2024	11	3	0	14
May 2024	7	3	0	10
Jun 2024	11	8	4	23
Jul 2024	11	2	3	16
Aug 2024	7	5	3	15
Sep 2024	9	5	1	15
Oct 2024	11	10	1	22
Nov 2024	8	3	1	12
Dec 2024	8	3	0	11
Jan 2025	15	8	0	23
Feb 2025	21	4	0	25
Mar 2025	15	8	4	27
Apr 2025	10	5	1	16
May 2025	20	5	2	27
Jun 2025	26	20	0	46
Jul 2025	2	7	0	9

Viewed (geographical distribution)

Total article views: 1,326 (including HTML, PDF, and XML) Thereof 1,326 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 10 Jul 2025

Short summary

Pipe burst in water distribution networks is an inevitable event. Pipe burst prediction helps to manage the maintenance of pipes, which reduce costs, water consumption and increase water network reliability. In this paper, we implement, compare and evaluate five artificial intelligence and machine learning methods for pipe failure prediction. Pipe failure data were collected during an eight-year-period in a real case study. Finally, the best method is selected based on some error criteria.


Total:	0
HTML:	0
PDF:	0
XML:	0

Implementing and evaluating various machine learning models for pipe burst prediction

Data sets

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.