On data warehouses, data lakes and tandems

Boston Consulting Group bloggers offer advice on the merits of various data platforms.

A blog post from Boston Consulting Group considers the ‘tough times’ in oil and gas following the 2014 downturn and the popularity of alternative forms of energy. Such challenges are being addressed by ‘implementing digital technologies’. However, ‘success has been limited [ because of ] an inability to fully leverage data’. BCG enumerates the well-studied issues of multiple legacy software applications, data formats, quality and ‘inflexible architectures’. The answer to these woes is ‘a central platform that includes a data warehouse, a data lake, or both’. Some companies are seeing the benefits of data platforms with one unnamed international oil reporting a ‘$7 billion cost saving over three-years’ from its platform investments (hardly a ‘limited success’!).

BGC offers a platform taxonomy to help-out with the transformation. A central data platform comes in three flavors. A data warehouse (a repository of structured data), a data lake (a repository of both structured and unstructured data) or a combination of warehouse and a lake. Data warehouses are proven technologies, with many solutions, vendors, and experts readily available. The downside is that data must go through a lengthy structuring process before it can be stored, and a rigid structure may make it hard to incorporate new data sources. Building a data lake is easy, just load the information as-is. The downside is that the ‘because the information hasn’t been structured, data lakes require more rigorous governance and management than warehouses’*. Moreover, ‘people with data lake architecture and data engineering skills are far scarcer than data warehouse experts’. Data lakes can include large, high-frequency time series production data and are amenable to the adoption of new digital technologies.

Using a data lake in tandem with a data warehouse is now a possibility with cloud-based data warehouses such as Snowflake or Amazon Redshift. These allow composite queries across structured, semi-structured, and unstructured content. Ready to write the check? BGS suggests either a DIY platform running on AWS, Google, or Azure. Alternatively, you can purchase a data suite from an industry vendor ‘such as Schlumberger Delfi or Palantir’. For more pros and cons on the different options and advice on technology selection read the BCG blog.

* Co-authored by by Sylvain Santamarta, Peter Forbes, Rash Gandhi and Michael Bechauf.

* This sound rather like the old ‘schema on write’ vs. ‘schema on read’ issue.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.