BP pioneers the Data Mesh

Accenture podcast describes BP DataWorks’ geocloud-based federated data platform. The interoperable mesh, the ‘hottest topic in data’, underpins BP’s Aims Tracker emissions reporting tool. The mesh is also to provide data for the data science/analytics teams, an area where past data architectures fell short. But what exactly is the data mesh?

In a recent Accenture podcast, Accenture’s Teresa Tung discussed BP’s embryonic data mesh with Abeth Go and Liam Donohoe (both with BP). Tung kicked off the proceedings, describing the data mesh as the ‘hottest topic in data’. BP was introduced as one of the few companies that has so far ‘embarked on the data mesh journey’. Donohoe explained that BP was transitioning from data lakes to data fabrics and standardizing its data ecosystem to support data marketplaces with a transformed data organization, BP DataWorks. BP was inspired by data mesh guru Zhamak Dehghani who advocates a focus on data products and self-service data access.

Go added that the mesh paradigm aligns with BP’s federated, global business, where there is a need to share across different units of the company. ‘The old model did not scale’. Implementing the mesh means recognizing the concepts of data management, governance and data product at the asset-level.

Tung asked for more on the state of BP’s data landscape and platforms. Donohoe explained that BP operates a global ‘geocloud’ strategy of a data platform distributed across dual cloud environments with multiple instances in each cloud. One challenge is to leverage key cloud technology alongside niche vendor solutions. The data mesh has to accommodate this landscape and BP is currently trying to identify which components can be realized as ‘data products’. In this context, the mesh is ‘essential’ to allow interoperability across the different technologies and platforms.

Tung, ‘So how did you make the business case for the mesh?’ Go responded that, in fact, no business case was presented! The mesh was deemed necessary because of the requirement to ‘increase velocity and reduce resource fragmentation’, particularly in the context of the new BP as an integrated energy company. The mesh is allowing BP to move fast with an architecture that can support and sustain the anticipated growth. For the new businesses, modularity is key. The mesh operating model will abstract underlying complexities and enable security.

Donohue added that the mesh is already bearing fruit in applications such as BP’s ‘Aims Tracker’ that captures diverse emissions data sources and helps with the trend towards net zero. Likewise the approach supports digital twins with commoditized data available to other business areas.

But it’s not all plain sailing. Addressing all the governance and identity management issues in a greenfield line of business is hard. But applying the new world of data mesh to legacy is even more challenging. Go agreed, the new data governance processes represent a mindset shift, ‘Now you are the data owner!’ ‘Change is happening, everyone is feeling it as we scaling ourselves to new ways of working’.

Tung asked for more examples. Go stated that while the primary actors are now data scientists, all data owners and managers benefit from spin-offs from the mesh. BP has ‘data managers coming out of our ears, geos, seismic people, all are getting to be good data stewards’. ‘Good librarians are part of our DNA, and the mesh is formalizing the whole approach to data management’.

Donohoe qualified the mesh as ‘best practices as applied to the data problem’. But it’s not all about technology. ‘We prefer to view the mesh as a suite of data products’. Technology plays a key role but the ‘harsh reality is that the technology is just not there in a number of areas’. The mesh is a journey and it will grow and evolve over time, starting locally with number of data products.

BP has support from its senior leadership with top-down understanding and buy-in. Particularly in trying to understand why past data architectures fell short. Go added that while the initial emphasis is on providing data for the data science/analytics community, ‘If we solve this the mesh could be how we provision data within and outside of BP, one place for all curated data. If it’s done right, the prize is much bigger’. Tung wound up the podcast by congratulating BP’s pioneering work on the mesh.

Like all good IT memes, the data mesh is loosely-defined. In her blog post Tung points to a piece by Zhamak Dehghani (cited by BP as canonical) who explains that ‘Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse’. The next big thing therefore is ‘a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product’. So there you have it. After persuading us that all would be fine once ‘disparate’ data was grouped into a data lake, we now have to move it out again. From disparate to distributed. Also it appears that there a knowledge graph in the mix, tying the distributed data bits together.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.