Can water companies handle big data?

In short, the answer is no. But not for the reason you might think.

More monitoring devices, especially on the network, are yielding ever increasing volumes of potentially valuable data.

big data

Water companies that are leading in the smart network arena are talking of 10 to 12 sensors (pressure, flow, noise, water quality) per DMA. So, for a water company with 2 million customers and 1,000 DMAs, that’s 10,000 to 12,000 sensors within 5 years.

That might seem a lot to an industry that is relatively new to data, but it is not a lot in the context of where the world has got to on big data.

Let’s consider the 3 Vs of Big Data.

Volume. 6 billion people have cell phones; an hour spent scrolling through Facebook consumes around 100 MB of mobile data. One autonomous car will generate 4,000 gigabytes of data. The CERN (Hadron collider) data store exceeded 200 petabytes last year. NASA’s Earth Observing System Data and Information System (EOSDIS) has only 9 petabytes of data, although it’s adding 6.4 terabytes of data and distributing almost 28 terabytes to 11,000 unique users around the world every day. A typical water network pressure logger is collecting just 300 bytes of data a day.

Velocity. For most applications in the water industry, daily uploads of data with threshold alarms with response time measured in seconds is adequate. This is slow compared with applications like trading, gambling, gaming, and driverless vehicles in which response times are measured in thousandths of a second.

Variety. Water company data is fairly homogenous. Time series data – values with time stamps, and factual information are the norm. There is no need to ingest large volumes of photographs, videos, audio recordings, email messages, tweets, documents, books, presentations, etc.

It must be easy for water companies then. No, it is not; far from it indeed. The challenge is that water companies can’t afford to add to their headcount. The people who work for water companies are already fully occupied with existing workload. They don’t have the time to trawl through more data, more graphs and more alarms. So the main constraint is capacity.

The only logical conclusion is that software needs to take the strain. Systems that acquire data but require supervisory control will not be enough. Software will need to elicit insight from the data and prioritise it in a way that makes current use of time more efficient. It will also need to do so in a coherent way across multiple data sources.

Can water companies handle useful insight? Certainly.

CTA_dump the data warehouse dream