How did your big data platform originate?
Mtell has oil and gas roots. We were doing predictive maintenance before ‘big data’ was invented. Things changed with the Deepwater Horizon wake-up call and the realization that not all data was available or visible from shore side. One driller, National Oilwell Varco (NOV) decided to beef-up its predictive analytics capability and selected Mtell. At that time, we used data streams on a rig to trigger notifications. But NOV wanted a solution that spanned multiple rigs with simultaneous operations and lots of historical data. This was not practical with the existing technology, so we contracted with MapR.
Is drilling data that ‘big?’ It’s a bit of a leap of faith to go to Hadoop!
We look at the problem as partly ‘tier 1,’ the algorithms that work on data from a single rig and ‘tier 2’ applications that work across multiple rigs. At the T2/datacenter level we do need a big data solution so that all rigs can share data and equipment vendors can learn from multiple facilities.
So Hadoop is really just a big file system to support your algorithms?
No it’s more than just the file system. We leverage Apache Spark in particular. This shifts focus from storage to big memory and lets us use machine learning ‘agents’ operating at the T2 level and deploy the findings on rigs for local processing.
What machine learning do you use?
We use a ‘deep learning’ ensemble approach plus signal processing and feature extraction. Deep learning scales well. Hadoop is not just for data but also table storage, historical data and learning algorithms. These include traditional word counts but extend to human/machine-defined algorithms indicative of failure.
What are ensemble models?
Small models that are tested on subsets of the data set or on different inputs. These are compared and ranked before synthesizing. They used to be called ‘random forests.’
But you don’t get big data over a satellite links!
Oils have made a big investment in data historians which can store up to around 1TB. But we are talking about 5-10 years of historical data plus real time sensor data at one second resolution. This is ‘big!’ Traditional historians may even have to resort to data compression, a major problem for analytics as data then has to be decompressed on-the-fly.
What systems are we talking about?
On the rig this will likely be a Windows machine running PI. In the data center, a large cluster. With 10k tags on a modern rig at 1 second and at 100 bytes/sample 32 TB/year/rig. We can also bring in weather data to say study the impact of heavy waves on loading of equipment, looking across many rigs for signatures that would be overlooked with a tier analysis. There is no way we could answer such questions without state of the art hardware and breakthrough machine learning. We are aggregating sensor, historical data and repair records to create a new kind of data.
Do you use for instance NoSQL?
Yes NoSQL is a critical part of our solution. It allows us to ingest time series data at high data rates. The downside is that there are fewer constraints than for a relational database. We need to handle these issues. It’s surprising that OSIsoft and Wonderware are now selling as ‘enterprise’ historians without Hadoop-style scalability*. MapR takes seconds to perform the kind of queries that take hours on such systems.
We have heard some of this before. Equipment manufacturers and oils lay different claims on data and have access to different data sets. OEMs may have more data on one kind of machine. Oils, less data on more heterogeneous equipment. Who calls the shots?
For sure GE wants to own the monitoring. In rail, companies may want to compare performance across motors from different vendors. But owner operators are really taking this in hand now.
What are the chances for MapR in geophysics?
does a lot more than vanilla Hadoop! Our CTO and co-founder M.C. Srivas
was with Spinnaker/NTAP and a contributor to Google’s BigTable. MapR is
five years ahead of the industry in the big data/HPC. Watch this space!
* Comment: OSIsoft has been researching Hadoop as a back-end for its PI System historian.
© Oil IT Journal - all rights reserved.