Imputation of incomplete and missing data in water mains failure database: comparing three imputation methods

Golam Kabir, Solomon Tesfamariam and Rehan Sadiq


Aging water infrastructure is a major concern for water utilities throughout the world. Due to lack of reliable data, it is challenging to develop an extensive water mains renewal program and predict the failure of the water mains. In particular, the small and medium-sized water utilities are affected more due to incomplete and partial and missing information. To estimate missing water mains failure data, the comparison of the three imputation methods: iterative robust model-based imputation (IRMI), multiple imputations of incomplete multivariate data (AMELIA), and sequential imputation for missing values (IMPSEQ) were performed using the cast iron (CI) water mains failure data collected from 1956 to 2013 of the water distribution network (WDN) of the City of Calgary, Alberta, Canada. The IRMI algorithm is a model-based imputation method where missing values are estimated using sequence of regression models (Templ et al. 2011). The AMELIA algorithm estimates missing values using the expectation–maximization with a bootstrapping algorithm (EMB) (Honaker et al. 2006). The IMPSEQ algorithm is a covariance based imputation method where missing values are estimated sequentially by minimizing the determinant of the covariance matrix of the data (Verboven et al. 2007). The accuracy between observed and estimated values for each imputation method were evaluated using mean absolute error (MAE), root mean square error (RMSE), mean relative absolute error (MRAE), and percent bias (PBIAS) techniques. The results showed that the performance of IMPSEQ method is much better to impute missing values in water mains failure databases compared to IRMI and AMELIA methods. All the three methods were underestimating the age and soil resistivity of the water mains whereas overestimating the diameter, length and the soil corrosivity index on the water mains. The proposed imputation methods can be used by any type of WDN, in particular the small to medium-sized utilities to impute the missing values of the water mains failure database and to develop an acceptable water main break deterioration models and water main renewal program.

Permanent link: