Big Data Paris 2019

Neil McNaughton spends some time at a generic ‘Big Data Paris’ conference, an action-packed affair with speakers from Audi on connected cars, Mapr on Kubernetes and its own dataware approach, and online betting company Betclic which runs a data warehouse with 1.5 billion records on Snowflake in the cloud. But how these disruptive new technologies might translate to oil and gas is where things get interesting.

It is always a good idea to listen to other industries, especially when they are engaged with initiatives that seem to parallel things that are happening in oil and gas. Hence my attendance at Big Data Paris organized by the curiously named ‘Corp Agency’. This event was seriously busy with a packed exhibition area where vendors large and small pitched to enthusiastic crowds. The multiple auditoriums were likewise busy, with folks pushing to get out against those who were pushing to get in. It was a bit like the Metro.

I managed to avoid much of the bun fight by sitting in on the plenary sessions where I heard Hubert Fischer explaining how VW Group’s Audi Electronic Venture unit is studying streaming data from connected cars. Data is ‘a lot more than the new oil’ it is the ‘new DNA’ for the auto industry. Audi’s onboard data collector takes data from the engine, tires, pollution, liquid levels, radars and transmits it all to an ‘internet of things backend’, aka the automotive cloud. Collaborating fleet cars can broadcast where there is a free parking spot! Real time is key, ‘life does not happen in batch’. VW’s own fast IoT platform in the cloud gathers data streams from millions of vehicles and performs real time analytics to analyze for black ice/aquaplaning risks. Swarm intelligence provides best route calculations ‘displacing service providers’ (like Google Maps? … really?). Kafka queues run on AWS for data ingestion. Asked what data volumes come into the system, Fischer stated that VW is not allowed to collect all possible data, ‘but if we could, we could have around 4TB from a single vehicle in an 8 hours ride’.

The ‘terabytes of data’ remark reminded me of GE’s extravagant claims for data streaming from its airplane engines that appear to have been largely made-up. If you are ‘selling’ big data analytics, then you need the raw material of big data. Once you have this you can beat up on folks for not using it. Given that the first use of an in car computer dates back to 1968, when Volkswagen introduced its ECU Computer (Engine Control Unit), and the fact that these have been standard on most cars since the late 1970’s, it is reasonable to assume that there is indeed a lot of data buzzing around in an automobile. But most of this emissions control data is already processed on board in real time. It’s not really a candidate for the cloud. Other data snippets – like ‘hey there’s a free parking spot’ could be pinged out as a very small message. Good data is not all ‘big’.

We then heard Remi Forest from Mapr describe a ‘Kubernetes and dataware ’ approach to big data. Seemingly web server/stateless solutions are ‘unsuited’ to intensive big data applications such as IoT, mobile, monitoring and analytics and where scalability (100k users up) is an issue. Forest’s talk was more hard-core IT with talk of containers and abstraction layers. At-scale management of data leverages Kubernetes and (potentially) millions of containers, ‘Google does billions!’ If one dies, never mind there are plenty others! Kubernetes keep containers up and running whatever happens. But data management remains a challenge in the world of containers. Data ‘gravity’, whereby bigger data attracts apps, is a problem. Enter ‘dataware’, an abstraction layer that ‘allows data to be managed as a first-class enterprise resource decoupled from other dependencies’.

I have a hard time relating to this talk of containers and Kubernetes. I’m not sure if they are terribly relevant to oil and gas IT. They seem to be necessary in a world of massive real-time access to computing resources and data. There may be a requirement for such somewhere in the upstream, I’m not sure, but if there is, it would be nice if someone else ‘abstracted’ all this stuff away!

Which is exactly what happened in the following talk from French online betting company Betclic. Camille Reverdy and Christofer Daussion have rebuilt Betclic’s analytics from scratch using a Snowflake data warehouse running on AWS along with an on-site Tableau for analysis. Snowflake is a totally managed service with 1.5 billion events in a single table growing at some 20 million/day. But wait a minute, Snowflake is a ‘collection of stateless services that manage virtual warehouses, query optimization and transactions’ and which appears to be hugely performant. Just the opposite of what Mapr was saying earlier in the day! has a dozen or so references to Tableau, but none so far for Snowflake. Having said that, Snowflake’s early growth came from displacing on-premise Netezza and Teradata boxes in various industries including oil and gas, according to CEO Bob Muglia.

We have reported on containers, in particular with our analysis of Docker, the ‘next big thing’ a couple of years ago. The container paradigm is associated with ‘small pieces of loosely-coupled software’ provided as microservices. But until there are many such software apps, as advocated in the OSDU, this edition’s lead, discussing containers in a world which remains one of monolithic applications is probably putting the cart before the horse. It’s interesting but probably a bit too techie for most.

And what of ‘generic’ technology as opposed to oil and gas specifics. Is it better to leverage off-the-shelf stuff like the Betclic folks have? In the Q&A someone asked about maintaining the Betclic stack. The reply was that in fact, there was not much development involved in the project. That’s rather different from those who advocate running a complete Hadoop stack, managing containers and what have you. If something like Snowflake can run your business intelligence, that neatly avoids all the ugliness of the cloud. However, your business may not map to Betclic’s. In Jim Crompton and Steve Cooper’s new book*, which we review elsewhere in this issue, one reason for the relative failure of ‘Digital Oilfield 1.0’ was the difficulty of applying generic technology to oil and gas because of its ‘domain complexities’. Which raises the interesting question as to how much of today’s novel big data technology stack can usefully accommodate the upstream’s data niceties.

*A Digital Journey: The Transformation of the Oil and Gas Industry. Cooper and Crompton 2019. ISBN 9781 7918 909 02

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.