Comparing the probabilistic approach with traditional methods of regression analysis

Zoe J. Y. Zhu and Edward A. McBean


Probabilistic network analyses provide a normative approach for updating belief on the basis of new information. Probability analysis supports parameter estimation and yields predictive distributions for quantities of interest. For inferences, probabilistic analyses, unlike classical statistical analyses, directly relate to the quantity/parameter of interest rather than to the value of a test statistic. The probabilistic approach uses more than the sample as it also incorporates prior information. However, the prior, while judgmental, does relate to the hypothesis of interest, whereas the sampling distribution relates to logically irrelevant, hypothetical samples. Clearly, the probabilistic approach is more focused on the problem of interest to water quality research and has been effectively used to incorporate expert knowledge and historical data for revising the prior belief in the light of new evidence in many fields. In this research, we compare the probabilistic approach to traditional methods such as regression analysis. For example, we determine the implications of increment size. Answered the question such as what is the magnitude of increment size that can be handled, without having computer implications? How many data points are needed, to allow the model to be used. How does this compare with traditional methods such as regression analysis. We also explored capabilities and advantages/disadvantages of DMN approach. We conclude that assumptions regarding distribution of causative variables. We identify, when using DMN, when high disinfection byproduct formation occurs (what are the input conditions to the model, that would result in formation of high DBP. Provide some guidance, as developed by DMN, as to how to avoid the formation of high DBP products and test the DMN model on different types of water treatment plants.

Through comparing DMN predictive model and classical multiple regression predictive models, it is explained why DMN is a better prediction model over multiple regression theoretically and experimentally. Although both models can be used to identify the relative significance of water quality (NOM indicators, pH, etc.) and operational variables (disinfectant dose, water temperature, contact time, etc.) responsible for the formation of DBPs and conducting sensitivity analyses, multiple regression models have their limitations, not only in terms accuracy, but also in terms of model capacity. For the accuracy, the culprit can be traced to the normal assumption underlying regression. For the capacity, DMN can perform bi-directional inference, but multiple regression cannot. It means that DMN can be a powerful tool for dealing short term real-time control.

Permanent link: