Book review Data analysis for scientists and engineers

Edward Robinson’s text book targets principally the mathematical. But there is enough narrative to intrigue the philosophically-inclined, with an intricate discussion on frequentist vs. Bayesian reasoning. But the trendy field of ‘data science’ is conspicuously absent.

Edward Robinson’s (professor of astronomy at UTx Austin) book, ‘Data analysis for scientists and engineers*’ probably caught our attention because, if you re-arrange the title slightly you might imagine that coverage includes the trendy topic of ‘data science.’ It does not. Neither does it include any computer code. As Robinson explained in an email exchange, ‘I refused to include any code in my book, despite heavy arm-twisting from Princeton University Press. There is publicly available R and Python code for all the techniques I discuss. I hope the book will still be useful when R and Python are remembered only by historians.’

Data Analysis is a math-laden text book covering statistics and, to a lesser extent, time series analysis. We found Robinson’s mathematical treatment hard going and not necessarily the easiest path to enlightenment. A diagram of Dirac’s spike would be more helpful than his integrals! For the less mathematically inclined there is plenty of narrative. The introductory chapter on the laws of probability begins with an image of a dice. Turn the page and you are plunged into a pet topic, the frequentist and Bayesian approaches to statistics. Here we learn, ‘frequency is meaningless for unique events.’ As this presumably includes the estimation of the size of a ‘single’ oil or gas prospect, we are all Bayesians, like it or not.

There is a ‘deep divide’ between the frequentist and Bayesian approaches but Robinson is hard-put to explain quite what this is, in a discussion that is teasingly spread across several chapters. The main chapter on Bayesian statistics starts with a detailed explanation of the false positive problem in drug trials. But both frequentist and Bayesian reasoning give the same result. No ‘deep divide’ here then. A section on fitting a straight line through noisy data makes the differences between the two approaches a clearer, although one is left with the impression that ‘Bayesian’ really just equates to common sense**.

Many ‘big data’ techniques (regression, principle component analysis, Markov chains) are covered. But the field of machine learning and artificial intelligence is conspicuously absent, even though these embed conventional statistics. Elsewhere coverage spans Fourier analysis, convolution and noise analysis although engineers and geophysicists may have their own favorite resources for these methods.

To sum up, Data Analysis is an impressive compilation of the mathematics behind ‘traditional’ statistical data analysis. The decision not to include computer code is understandable but this has disenfranchised those whose fingers are hovering over the ‘compute’ button of a big data/analytical application. Data Analysis’ coverage underscores the gap between statistics and the way it is practiced today in artificial intelligence-oriented application. Users of these techniques may find Robinson’s book a useful reference, but they will likely be in a minority. The worlds of data analysis and data ‘science’ are drifting apart with the former concerned with finding underlying physical causes, the latter more with finding something that just works!

* Princeton, ISBN 9780691169927.

** We are not alone in this surmise.

This article originally appeared in Oil IT Journal 2017 Issue # 9.

For more information or to comment on this topic email here.