Book Review - Quantitative Geosciences: Data Analytics, Geostatistics, Reservoir Characterization and Modeling

Zee Ma’s monster (640 page) work sets out to fill the gap between quantitative and descriptive geoscience analysis for reservoir characterization with a systematic ‘integrative’ approach. While recent data-driven approaches are included, QG adopts essentially a mathematical and first principles approach. The fairly heavy mathematical content is balanced by Ma’s erudition and insights that run through the work. Ma argues that in-depth domain knowledge and first-class statistical know-how will trump the naïve ‘data-science’ approach.

Quantitative Geosciences* (QG) is a 640 page textbook by Zee Ma, a scientific advisor on geosciences and mathematics with Schlumberger. QG sets out to present ‘quantitative methods and applications of integrated descriptive-quantitative geoscience analysis to reservoir characterization and modeling’. From the preface we learn that while geoscience data analysis has increased significantly in the last three decades, coverage in the literature has been ‘uneven’ and there are ‘significant gaps’ between descriptive and quantitative geosciences. QG ‘attempts to fill some of these gaps through a more systematic, integrative treatment of descriptive and quantitative geoscience analyses by unifying and extending them’. Moreover, although descriptive and quantitative methods may appear unconnected, they can be ‘annealed into a coherent, integrative approach’. Ma likes his hyperbole!

‘Earth science, like other scientific disciplines, is increasingly becoming quantitative because of the digital revolution’. Ma places QG in the context of current IT tropes such as the ‘4th industrial revolution’, ‘digitalization’ and ‘artificial intelligence’. This reviewer first encountered what was then termed ‘numerical taxonomy’ (of fossils) in 1968 so ‘quantitative’ in fact has a pretty long history in earth sciences. Ma frequently prefaces technology as having made significant progress ‘in the last few decades’. ‘Geological models, once drawn by hand, have evolved into digital reservoir models that can integrate various geoscience disciplines, [.. with ..] both descriptive and quantitative data’.

However, ‘the potential of big data and quantitative methods is not yet universally recognized in the geoscience community due, to a lack of familiarity’. Hence the aim of QG is to familiarize the reader with data analytical methods including probability, statistics, geostatistics, data science, and integrated geosciences. An ambitious goal indeed. Does he succeed?

First an admission. This book is far too big and wide-ranging for this reviewer to be able to read and digest in our allotted time frame, so this review dips in and out and makes frequent use of Acrobat’s word count function. But dipping in and out does give a feel for Ma’s thinking and writing. On the thinking side, Ma has opinions on just about everything and provides insights that go beyond the usual textbook. On the writing side, let’s say that conciseness is not his main concern. That’s OK though, Ma has a lot to talk about!

Ma does not pretend to compete with earlier books on ‘mathematical geosciences’, presenting instead ‘data analytics and descriptive quantitative integration’. He focuses on ‘multidisciplinary applications of geosciences to reservoir characterization and modeling’ with the aim of providing a ‘basic understanding of relevant theories and how to put them into practice’.

QG’s contents occupy some 15 pages. Part 1 (data analytics) covers data analysis, correlations, principle component analysis, regression, machine learning. Part 2 (reservoir characterization) includes geological heterogeneity, petrophysical analysis, lithological characterization, spatial analysis, seismics and geostatistics. Part 3 (reservoir modeling and uncertainty) 3D models, kriging, stochastic modeling, more geostatistics, porosity and permeability modeling, water saturation modeling and hydrocarbon volumetrics, upscaling and uncertainty.

QG is heavy on analytical methods and statistics. 30 pages are devoted to the subject of kriging. Another 30 to stochastic modeling of continuous variables. The presentation is betimes heavy on mathematics, elsewhere packed with interesting discussion on topics such as whether stochastic realizations are truly ‘equiprobable’ and the degree to which such models are ‘real’.

With the arrival of ‘big data’ and ‘analytics’ what is the role of all this statistical stuff? Ma argues as follows: ‘Before the arrival of big data, statistical methods used in science and engineering were dominantly model-based with an emphasis on estimation unbiasedness. Although many traditional statistical methods work well with small datasets and a proper experimental design, they are less effective in handling some of the problems that have arisen out of big data. Artificial intelligence has led the way to data mining for discovering patterns and regularities from big data and for making predictions for scientific and technical applications. Although the movement was initially led by computer scientists, statisticians and engineers are now all involved, thus strengthening the trend’.

A chapter on machine learning includes an enlightened discussion of AI and under and overfitting models. ML can produce extremely accurate models that include impossible conditions such and ‘the creation of unphysical or unreasonable values, such as negative porosity or permeability, porosity greater than 100%, etc.’ Avoiding such ‘overfitting’ is ‘one of the most challenging problems in using a machine learning algorithm’. Ma also draws attention to ‘one of the biggest problems in big data’, collinearity, a problem that also arises in classical multivariate linear regression. The effect of collinearity can be ‘dramatic and often difficult to interpret’. Especially as ‘the concept of big data promotes the use of many related variables, which makes collinearity even more severe’. Ma discusses means of mitigating collinearity issues and their effect on model fit.

Another ML ‘gotcha’ is the ‘no free-lunch principle’ whereby a good learning algorithm in some situations may not be so in others. ‘There is no such a thing as a universal machine learning algorithm that gives the best solution to every problem of applications.’ Ma concludes the ML chapter with a warning, ‘Machine learning methods can be very powerful, but they have many pitfalls waiting for the unwary … such as collinearities, inconsistencies and noise. Data integration and prediction using machine learning can be very useful when combined with subject-matter knowledge and other modeling techniques, such as geostatistical methods’.

More advice from the field comes in the contrast between people who are ‘more tuned to modeling procedures/workflows’ and those who are ‘focused on […] integrated analyses and inference’. Knowledge of software modeling tools along with some quantitative aptitude does not make you a modeler. ‘Using modeling tools without understanding the scientific problems and integrated analytics is like giving someone a hammer to interpret an outcrop. The hammer is only a tool in the process of understanding the rock’.

Addressing the ‘disruption’ of the AI/ML revolution, Ma opines that ‘a modern geoscientist should be catalyst, not a casualty, of digital geosciences. Knowledge of multidisciplinary geosciences is a start, analytics will lead to capability, and experience will foster proficiency’. Ma warms up in his critique of ‘questionable practices’ in big data, warning of the ‘excessive appreciation of model appearance’ and the use of ‘exotic methods’. Such approaches have been deprecated as ‘glory models’.

To return to our question, does QG help realize ‘the potential of big data and quantitative methods’. The answer is yes, and no. Ma is more reticent than the introduction would suggest as to the ‘progress’ made in ‘data science’. Those seeking cookie-cutter TensorFlow code will be disappointed. There is no mention of Python. The main thrust of Ma’s argument is that in-depth domain knowledge and first-class statistical know-how will trump the naïve ‘data-science’ approach. One can only agree although some of today’s Python script kiddies might recoil at such heresy.

* Quantitative Geosciences: Data Analytics, Geostatistics, Reservoir Characterization and Modeling. Springer ISBN 978-3-030-17859-8 ISBN 978-3-030-17860-4 (eBook).

This article originally appeared in Oil IT Journal 2020 Issue # 2.

For more information or to comment on this topic email here.