Prelocalization and Leak detection in water drinking distribution network using modeling-based algorithms: Case study: The city of Casablanca (Morocco)

The role of a water drinking distribution network (WDDN) is to supply high-quality water at the necessary pressure 10 at various times of the day for several consumption scenarios. Locating and identifying priorities of water leakage areas becomes major preoccupation for manager of the water supply, to optimize and improve constancy of supply. In this paper, we present the results obtained on the field from a research conducted in order to identify and to locate leaks in (WDDN) focused on the resolution of the Fixed And Variable Area Discharge (FAVAD) equation by use of the prediction algorithms in conjunction with hydraulic modeling and the Geographical Information System (GIS).The leak localization method is 15 applied in the oldest part of Casablanca. We have used, in this research, two methodologies in different leak episodes: (i)The first episode is based on a simulation of artificial leaks on the MATLAB platform using the EPANET code to establish a database of pressures that describe the network’s behaviour in the presence of leaks. The data thus established has fed into a machine learning algorithm called Random Forest, which will forecast the leakage rate and its location in the network; (ii)The second was field-testing a real simulation of artificial leaks by opening and closing of hydrants, on different locations with a 20 leak size of 6l/s and 17 l/s. The two methods converged to comparable results, the leaks position is spotted within a 100 m radius of the actual ones.


Introduction
Climate Change (CC) is a major global issue, more and more important on the international scene. It affects all components of 25 the hydrological cycle. The situation of water resources in Morocco is already critical with a state of water scarcity forecasted for 2020. This problem is accentuated by the effects of CC and may hinder any further sustainable development. The expected CC for Morocco would have direct and indirect harmful consequences on the water resources potential, in terms of both quantity and quality, on the water demand and on the efficiency of use of this resource by the different users. An anticipation of the adaptation to the effects of this CC must pass by the valorization of the use of the resources and especially the 30 https://doi.org/10.5194/dwes-2020-3

Drinking Water
Engineering and Science Discussions Open Access Preprint. Discussion started: 25 March 2020 c Author(s) 2020. CC BY 4.0 License. minimization of the water losses. In this regard, in Moroccan urban areas, drinking water distribution networks have particularly low yields. The location and prioritization of leaking areas is a major concern for the public authorities to optimize the use of water resources, reduce losses and improve continuity of service.
To guarantee the high-level service of pressure, the detection and repair time of leaks is certainly the most common factor used in the analysis of decreases in contract pressures. 35 For most of the time, before starting a leak detection campaign in a Discrete Hydraulic Sector (DHS) we start with the analysis of the flows into and out of the sector, in particular the minimum night flow (MNF) between 2:00 AM and 4:00 AM, as well as the volumes of major consumers (Alkasseh et al., 2013).
In the literature it is possible to detect leaks in the DHS, Usually the leakage rate is permanent over time, if the DHS records an increase in night flow, this increase should also appear during normal consumption time (Oasen, 2015a). According to 40 research by Farley et al., (2008) An increase in minimum night flow can be used for targeting all DHS where leakage is more likely. It is therefore possible to detect leaks in a DHS by making a hydraulic balance between the volume of billed consumption and the volume distributed, by comparing the expected demand and the actual water consumption (Bakker, 2014).
Once new leaks by DHS are identified, various techniques are used to locate the leaks. Acoustic leak-detection is a technique which has evolved a lot in recent years and is developing rapidly (Farley, 2003). Some of these techniques require partitioning 45 a WDDN into smaller DHS, by closing certain valves on the network, which can sometimes shutdown the system (Colombo, 2009).
In addition, various research projects noted that it is difficult to apply the leak-detection to certain areas due to the complexity of isolating and partitioning (Andrea et al., 2011). Through the applied works in modeling leakage, in particular, those of Babel et al., (2009) andSebbagh et al., (2018), a reduction in pressure at the inlet of DHS, induces a reduction in leakage rate. For 50 Al-Ghamdi et al (2011), a 25% reduction in pressure contributes to a leakage flow reduction of about 25% for a 50% rigid 50% plastic network.
Our approach, as we will see through the following paragraphs, is to do a virtual leak search without partitioning a WDDN into smaller DHS.
The deficiency of leak management is one of the key problems, given its impacts on production cost and resource exhaustion. 55 The scope of this paper will be mainly focused on the application of the two approaches in two different leak events. The case study is a pilot sector in the city of Casablanca (Morocco), which covers around 24 000 inhabitants as displayed in Fig. 1 https://doi.org/10.5194/dwes-2020-3

Drinking Water
Engineering and Science Discussions Open Access Preprint. Discussion started: 25 March 2020 c Author(s) 2020. CC BY 4.0 License.

Figure 1 Delimitation of the study area.
The study area has three inlets, 493 nodes and 42 km of pipes. Concerning the instrumentation, the network flow and pressure 60 are monitored through flowmeter in diameter 300 mm and pressure sensors at each inlet.  Preprint. Discussion started: 25 March 2020 c Author(s) 2020. CC BY 4.0 License.

Software
EPANET is a free software developed by the US Environmental Protection Agency (U.S. EPA). From the representation of the distribution network (nodes, pipes, tank, valves, pump, etc.), it allows the hydraulic balancing of the network by the calculations of pressure losses, flow velocity, flow in the pipes and pressure at the nods. (Rossman, 2000). 70 In practice EPANET is used by water utilities (EPA 2005) and in literature (Farina et al., 2014).
The basic demand for the hydraulic modeling software EPANET 2.0 is defined as a water output at each node, We consider that there are two main methods to simulate a water leak in EPANET, as an additional demand; or even water flow rate through a Valve, the formula to calculate the head loss (Darcy-Weisbach) was used with the default values for the roughness (Brown, 2002). 75

Method
(i) Relationships between pressure and leakage rates in distribution networks.
Pressure management not only involves reducing pressure, but also other pressure control and optimization methods without compromising customer service. A definition of pressure management in its broadest sense is given by Thornton et al., (2005), "pressure management is about controlling the pressure of the system to achieve a level of optimal service, to ensure an efficient 80 supply to consumers while avoiding the unnecessary excesses of this pressure which would unduly increase leaks.
Water utilities often take to design their distribution networks the minimum pressure that occurs at the critical point at maximum demand. Understanding this concept is of great importance as pressure regulation can significantly reduce leakage without compromising the level of customer service.
The term "FAVAD" comes from the English "Fixed and Variable Area Discharge Paths". This concept, through the definition 85 of an exponent N1, defines the relationship between the leak rate and the pressure (Rozental, 2010): The relationship between pressure and flow leaks is given by Eq. (1): With L0 and L1: the leakage flow before and after pressure reduction.
P0 and P1: the pressure before and after reduction. 90 According to Al Ghamdi, (2011) et Cobacho et al., (2014), the main method for representing leaks in a hydraulic network model is through adding a leakage valve for each node, the emitter parameter is used to model flow rate through a valve. These emitters devices permit the modelling of flow evacuated to the atmosphere through a nozzle. the equation below represents the concept of FAVAD, through a flow rate, pressure and emitter coefficient Eq. (2): where Qleak is the flow rate at node j, C is the emitter coefficient, P pressure at node j https://doi.org/10.5194/dwes-2020-3

Drinking Water
Engineering and Science Discussions Open Access Preprint. Discussion started: 25 March 2020 c Author(s) 2020. CC BY 4.0 License.
The exponent N1 of the above equation varies according to the material of the pipeline (mainly its elasticity), for a circular opening on a rigid pipe (cast iron, steel), N1 is of the order of 0,5 whereas it reaches 1,5 or more for longitudinal slots on plastic materials (PVC, PEHD). However, international feedback shows a variation of N1 between 0.36 and 2.95 depending on the networks experienced, as shown in Fig. 4 (Rozental, 2010). Figure 3 illustrates the influence of N1 the exponent emitter 100 on the impact of pressure reduction on leakage rate.

Figure 3 Relationships between pressure and leakage rate using the N1 Approach (Rozental, 2010)
The data collection relative to the various components of the drinking water networks (pipes, reservoirs, well, drilling, pumps, valves) is made by means of shapefile exported from the database cart@jour. It's an interface GIS (Geographical Information 105 System) available for consultation in intranet which includes three networks managed by the drinking water operators as well as all of the hydraulic structure which constitutes them. The interface allows extracting all the desired layers while geometrically targeting the study area (zone of study). the following Table illustrates the roughness values used during modeling (Chadwick et al., 2013). The elevations are extracted from the Digital Elevation Model (DEM) layer and automatically assigned to network nodes.
The annual average consumption for 2017 in addition, are distributed into each node in the model according to the geographical distribution of subscribers within the tour. Once the network template is prepared using ArcGIS, it is transferred to the EPANET software in .inp file. 115 The EPANET hydraulic simulation model calculates node pressure and pipe flow for a fixed reservoir level and variable water demands over time and space. The calculation of the head and the flow at a particular point in time involves the simultaneous resolution of the flow conservation equation for each of the nodes and the equation of the pressure drop in each pipe of the network. Dynamic simulation is used to describe the operation of the network during a given period, while taking into account the variation in customers' consumption over time. 120 Several simulations using the Epanet software were used to determine the roughness coefficients of the pipes to obtain calculated pressures which indicate the actual pressures in different nodes of the hydraulic system.
To solve this localization of leakages we will use the random forest algorithm (R Learning and Prediction Algorithm).
The leakage localization methodology displayed in Fig. 4 is based on data mining algorithms, the starting point of the algorithm 125 is the learning of the data obtained by simulation using EPANET simulator. Then four training data elements are used to predict the location of the leak: • The distance between simulation node and the sensor simulation node with respect to the four sensors P14, P57, P58 and PC, as well as the maximum, minimum pressures. and average at these measurement points, the results are reported in Table 2. The technique used to search preventive leakage on distribution networks is organized around three distinct but complementary 165 operations: sectoring, pre-location, followed by localization. These methods must be adapted according to the dimensions and the degree of knowledge of the targeted.
The objective of sectorization is to define priorities between different sectors and to estimate or even quantify the level of leakage. It defines fugitive areas larger than the linear kilometer; The objective of the pre-location is to check the presence of leaks in a given sector and to determine their position with a 170 precision of the order of magnitude of the hundred meters; The objective of the location is to define the position of a leak with a precision of the order of one meter.

Random Forest
The type of learning we apply to anomaly detection in this article is a supervised learning. In Zhang et al., (2008), decision 175 tree forests are used to detect intrusions from the network.
To execute the anomaly detection method by supervised learning, we used the statistical software -R (Zhang et al. 2008).
Random Forest is a type of tree based supervised learning algorithm (Ho, 1995). It uses many decision trees to aggregate the answer. In this paper, the supervised Random Forest algorithm was used as technique to detect the leaks (Breiman, 2001). In addition to its efficiency, this algorithm is famous for its ability to treat big-data. 180 The random forest optimization principle is based on the combination of multiple decision trees, to extract different classes from the original raw dataset. Then, the average classes are determined based on the classes outputted by the decision trees used. Thus, the performance of the resulting model is enhanced, compared to one decision tree model, and the ability to apply the resulting model in other datasets is acquired. Figure 6 illustrates the principle of running a random forest algorithm.  The input data are: Training data td, (number of sensor node, the distance between simulation node and the sensor, the leakage flow Emitter Coefficient and Pressure at the sensors).
The process engaged in the proposed leak detection is concisely discussed in the following Algorithm:

Preparation of input data for the algorithm 210
Several pressure profiles for reference (without leaks) are required to attain satisfactory level of prediction from the data analysis algorithms. This pressure profiles are obtained by using EPANET. Around 7 references cases are added to the table 1. A first case is the pressure reference that has been simulated, the others constitute a translation of the reference curve of +0.1, +0.2, +0.3, -0.1, -0.2, and 0.2 meters, forming envelope with 0.6m amplitude as shown in Fig. 7 below case of P14 measuring point. 215

Figure 8 Sensors and leaks locations
The leak localization method depends on the head loss, itself depends, among other factors, on the size of the leak. Thus, if we 225 have a limited leak, the sensors could not detect the small head loss, a leak of 6 l/s is chosen as a lower simulation limit. As upper limit, we chosen the 17 l/s leak, and we presume that those bypassing this limit finish by surfacing, and therefore do not require any localization process. (Pérez et al., 2014a).
At "PI A" we simulated two leaks of 6 l/s and 17 l/s; for the "PI B" a small leak of 6 l/s and for "PI C" a leak of 17 l/s (Table   3). 230 Table 3 The artificial leaks that were created for each location ID Leaks fire hydrants Flow LA-1 PI A 6 L/s LA-2 PI A 17 L/s LB PI B 6 L/s LC PI C 17 L/s

Data-reading pressure at sensor
The performance of the pressure for the artificial leak simulation during May the 3rd is shown in Fig. 9. The blue line in Fig.9 shows the daily pattern of pressure at P14.  The results of the simulation are very close to what is measured as shown in Fig. 10 and 11.

245
The red line corresponds to the simulation within EPANET and the green nodes to the pressures measured by the sensors.
To calculate the magnitude of the leak for each time and node, Eq.2 was used. The following Table shows the emitter coefficients simulated on each network node.   Note that for the 4 last leaks we have L = 1 which means that the algorithm classified them as leaks. However, the leak is overestimated. The algorithm shows the days of 23/04 and 26/04 as being cases of leaks.
The results of these two days are not used because the change in pressure profile during these days was not because of a leak 255 but because there was closure of some valves in the area to do some work, The data analysis confirms that, for an emission coefficient of at least 2, the leaks pre-localization via the adopted method is possible in particularly for flows passing 10 l/s. Indeed, these values of flow provoke important "head loss" easily detected by the pressure sensors implemented within the acting zone, which confirms the hypothesis made at the beginning of this study. 260 Two phenomena can explain the limits of the current method in terms of its capacity to detect leaks with low flow values. First of all, according to Jarrige et al., (2011) several factors may influence the leak noise propagation til the sensors, such as: the material type, the pipe diameter, and more importantly the pipe roughness. In fact, the misevaluation of this last factor influences the reference pressure calculation. According to Paquin et al., (2000) the results prove that it is necessary to measure the real roughness in order to interpret correctly a simulated and a measured pressure. 265 The second reason beyond the limited performance of the proposed method to spot leak characterized by low flow was highlighted by a study of Mirats-Tur et al., (2014). In their paper the authors demonstrated that a mis-calibrated hydraulic model (in terms of the topographic structure and its parameters), the precision regarding the estimation of the spatial water The presented results are outputted from a model established using measures from the network. Some of these measures are considered to have a good precision, and other have a certain level of uncertainty. For instance, the roughness and the nodes' elevations measures are highly impacted by uncertainties. Another factor that impact the precision of the proposed method is the measuring devices in terms of their recording interval (the pressure is measured each 5 minutes). In order to optimize the 280 detection, and to focus on the leaks with high head losses spotted by standard sensors, it is recommended to use sensors with high frequency, capable of recording a high number of samples. This will help detect the small pressure variation caused by low leak flow.

Displaying results
The proposed method outputs are not always reliable. In fact, instead of a deterministic mapping of the leaks, there is a 285 probabilistic output that maps the probability of occurrence of leaks in space (Pérez et al., 2014b If the forecast indicates a leak, which is estimated, to locate it on the map, we have 4 distances from the 4 pressure sensors, around each sensor we draw a circle of radius corresponds to the given distance by the prediction Fig. 12, the ideal would then be that these four circles intersect at a single point which corresponds to the leak point, The intersection of these 4 circles will then be at the maximum at 12 points if all the circles intersect with each other at two 290 points, considering two circles, there are three cases:

•
The circles intersect at two points, one corresponds to the leak point and the other is his symmetrical with respect to the line passing through the centers of the two circles, • Circles tangent to each other intersect at a single point that corresponds to the leak point

•
The circles do not cross, but if the forecast is good, they can get closer in the leaking zone 295 By drawing all the points of intersection, around each of them, the leakage location is identified within a 100 m radius.
A good performance with highest probability of having the location of leak when there is a big agglomeration of circle that has a more intersect point. This agglomeration corresponds to the cumulative probability of the given nodes to experience a leakage (Fig. 13).

Figure 13 Spatial location of the leak LC
Our research objective was the purpose of discovery appropriate solution for detection and localization of leakages and estimation of the size of leakages for a water distribution system. 305 The results obtained using this approach is satisfying. The leak is identified within a 100 m radius.
That said, the detection of a leak is extremely related to its location within the network. For instance, one located in the looped section of the network is less likely to be spotted in night time. In the mesh part of the network the pressure fallen at the sensor levels are too low, which could lead to disturb by uncertainties in the model, the measured pressures obviously involve significant errors, which reduces, in the analysis, the possibility of detecting leaks of lesser importance. 310

Conclusions and perspectives
Our research objective was the purpose of _discovery appropriate solution for detection and localization of leakages and estimation of the size of leakages for a water distribution system. The FAVAD parameters were optimized via a prediction algorithm, to constitute the core of our adopted procedure. The adopted approach necessitates a coupled hydraulic-GIS interface by mean of the random forest algorithm. 315 This work helped to spot critical leaking points, and therefore contribute in the effort of physical loss reduction. Although, the detection results were not always accurate in term of space localization, the radius of search is reduced substantially, which make the detection rates during field campaigns more successful and less time-consuming.