On the uses of graph databases in water distribution systems

Adam Rose, XP Solutions, Portland, OR, USA

ABSTRACT

A water distribution modeling package has some form of database, engine, and user interface.  A predominant collection of current software offerings use some form of relational database (or pseudo-relational database).  This makes a lot of intuitive sense, because relational databases excel at:

  • Performance: fast storage and retrieval of data for individual locations and/or parameters
  • Stability: cases where there is a defined schema
  • Availability: there are several different embedded and server configuration offerings with associated relational mappers, libraries and development kits

There are times when a relational database is not the most ideal format for retrieving data.  Algorithms that search out connected networks in a relational database require tree traversals, which are recursive searches that require multiple table joins.  This type of search is common when attempting to isolate leaks in a distribution system or to search for orphan nodes or links.   A graph database is a style of data storage that consists of a collection of nodes [junctions] and edges [pipes] and properties [diameter, demand, etc.], and is built specifically for this type of search.  This format also excels at other questions, like finding the fewest number of valves to close rather than the smallest network size.  In fact, although the world relational is part of a relational database, a graph database is concerned principally with relationships between objects in the network.

In addition to any query that asks about the spatial relationship between network objects, graph databases allow for more efficient and intuitive visualization techniques for these relationships.  A query that asks for every possible flow path between a source (pump) and sink (junction) based on hundreds or thousands of demand options is a simple task for a graph database but a potentially monumentally resource intensive task for a relational database.  The ways in which this and other graph queries are visualized are also quite straightforward and intuitive, using dynamic network graphs, flare dependency graphs or other options.

The purpose of this presentation is to explain the basics of how a water distribution system data model can exist in a graph database, show performance differences between the two database styles, show exciting and new performance visualization tools of graph databases, and to provide tools for data transfer between the two formats.  EPANET will be used as the water distribution platform, with SQLite used for embedded relational databases and Neo4J for graph databases.  All platforms are freely available for download.


Permanent link: