Book review—Big data in oil and gas

Keith Holdaway’s new book, ‘Harness oil and gas big data with analytics’ offers plethoric advice and opinion on the technology’s potential but falls short on compelling use cases.

Harness oil and gas big data with analytics,* (Harness) is a large well-produced book replete with comments and admonitions from a recognized industry expert. Holdaway has co-authored several SPE papers and is a member of the advisory council of the SPE petroleum data driven analytics technical section. To give you a flavor of where Holdaway is coming from, a few sample quotes from the introduction. ‘Traditional deterministic and interpretive studies are no longer viable as monolithic approaches to garnering maximum value from bid data across the E&P value chain,’ ‘The digital oilfield [..] generates a plethora of data that, when mined, surfaces hidden patterns that enhance conventional studies.’ ‘It behooves the upstream geoscientist to implement in-memory analytics technology.’ ‘Geophysicists are entrenched in their upstream bailiwicks.’

First a word as to what Harness is not about. A quick scan of the index reveals that there is no Hadoop, no NoSql, MapReduce or other of the trendy technologies that the ‘big data’ movement of recent years has spawned. Harness, as befits its SAS credentials, is about the collection of statistical and fuzzy techniques that make up data mining. To these, Holdaway adds lashings of oil and gas industry focus, use cases and terminology.

Holdaway has it that the big data problem set comprises three prominent issues, data management, uncertainty quantification and risk assessment. He argues for an approach that combines data discovery ‘first principles.’ It should be possible to explain what is causing an observed correlation.

Holdaway is verbose and repetitive but not without humor. The oil and gas industry is ‘moving toward adoption of data mining at a speed [that would have been] appreciated by Alfred Wegener.’ His explanation of 3D seismic acquisition as being like ‘dropping a bag of ping pong balls into a room’ is funny too, although perhaps not intentionally. Harness covers a lot of ground in its 300 pages. Aside from a short mention of SAS’ Semma approach, the approaches described are generic—and there a lot of them, more than can be covered in a short review.

An introductory chapter on data management introduces the big data buzzwords of volume, variety, velocity. Holdaway walks though the notions of data quality, governance and master data to introduce a four tiered data architecture and a production data quality framework.

The seismic use case is clearly dear to Holdaway’s geophysical heart. He has it that ‘soft computing methodologies that map seismic attributes to reservoir properties are incredibly important as a means to define more credible and reliable reservoir characterization definitions for field development.’ Harness offers its own ‘plethora’ of techniques for seismic trace analysis—PCA, various transforms, clustering and more. He warns against ‘overtraining’ of algorithms on sparse data and states intriguingly that ‘the majority of common [seismic ]attributes are redundant.’

Hodaway refers to the ‘top down’ and surrogate modeling approach of Intelligent Solutions Inc., used to address non productive time, stuck pipe and to optimize production. Another section covers unconventional reserves estimation.

All in all there is a lot of material here. Our main criticism is that the use cases fall frustratingly short of compelling. While Holdaway’s advice and commentary should help practitioners practice, Harness fails to provide hard evidence (that could be presented to management) that such techniques really work, or which of the many described, work best.

* Wiley/SAS Institute—ISBN 978 1 118 77931 65700.

This article originally appeared in Oil IT Journal 2014 Issue # 5.

For more information or to comment on this topic email here.