Big data, small uncertain conclusions

Back from the SPE Digital Energy conference, editor Neil McNaughton wonders how much big data-driven, artificial intelligence can be trusted. Authors rarely follow the SPE’s own guidelines on statistical testing. Perhaps ‘reproducible’ research and publishing is the way ahead.

A common theme at the SPE Digital Energy event held at The Woodlands, Texas earlier this year (report in next month’s Journal) was data-driven analytics. Often this involves using some fancy statistics on a ‘training’ data set and then applying the learning to data that was not included in the training and see if it works. The approach combines hi-falutin science in the analysis phase with extreme statistical naiveté in prediction. I call this the ‘suck it and see’ (Sias) method.

In my early days in the North Sea we used Sias to depth convert our seismics. But of course we were using small data—sometimes very small data—maybe a handful of wells with dodgy velocity measurements. This small data was augmented with slightly larger undoubtedly wrong data derived from seismic surveys. We then interpolated or extrapolated to provide a prognosis of the well tops which were usually a fair way off target. I am sure that things have changed since then but I’m not sure that they have got a lot better.

Today, modern computing techniques let us apply statistics to much larger data sets and faster data streams than was previously possible. The assumption is that the more data you have, the more likely you will come up with something significant. This is of course a specious argument. Think for a minute of as near a ‘perfect’ a correlation as you would like to have. One that would have you rush out and bet the house on its ‘predictive analytics’ capability. Well you would be wrong to do so because I hid this fabulous correlation in a huge data set where it was arrived at purely by chance. Its predictive value was nil. In fact, any apparent correlation can come about purely by chance, given a large enough set of data. In fact, the bigger the data, the more likely it is to contain completely spurious results.

Of course I am not inventing anything here. The search for significance in statistics follows a well-trodden path. A statistical result should be evaluated against the ‘null hypothesis’ i.e. the possibility that it has come about by chance. Null hypothesis testing is a standard piece of kit especially in the life sciences where ‘evidence-based’ medicine is popular—making one wonder what else medicine might be based on, but I digress.

The usual statistical test for null hypothesis is the P-value, a measure of how much more support the data provides for your ‘alternative’ hypothesis over the null hypothesis. I’ll get back to this in a minute, but first I wanted to share with you a recent newspaper article that questions the widespread but uncritical use of artificial intelligence in the field of neuroscience.

The Le Monde piece was written by Karim Jerbi (U Montreal) who took a pot shot at the use of supervised learning methods, a technique that has quite a following in the digital energy community. The method tries to classify data according to natural affinity. It might for instance be used to distinguish between different rock types based on log cross plots. It uses artificial intelligence on a subset of available data and then evaluates its ability to classify the remainder of the data (yes this is Sias).

Jerbi’s team (which works on monitoring brain activity i.e. time series rather like oilfield monitoring data) showed that extremely ‘good’ classifications could be achieved from what were in reality completely random datasets. He observed that the use of bigger, complex multi-disciplinary data sets makes it hard to evaluate the likelihood of meaningful results and called for better policing by the community (of neuroscientists) of published results. It may be hard to figure out just how to evaluate a P-value from such data to check the null hypothesis.

I am sure that the same could be said for oil and gas use of AI where, although it may be hard to figure a P value, at least one ought to try. I thought that I’d do some big data experimenting myself in the form of some full text searches for ‘null hypothesis’ on the spe.org website. This returned a measly five references. A search for ‘P value’ did better, with 237. One of these caught my eye. P value testing is mentioned in the SPE’s style guide to authors which deprecates the use of words like ‘very’ and suggests instead that ‘to express how significant results are ... report the P-value.’ It seems as though few SPE authors are reading the style guide because a search for the word ‘very’ comes up with around 250,000 references! While this is not a very scientific investigation it supports the (perhaps obvious) notion that there is a tendency for putting a positive spin on a result rather than engaging in a rigorous analysis.

Just to confuse the picture further, Nature recently reported that another publication, Basic and Applied Social Psychology has banned the publication of P values, as such statistics were ‘often used to support lower-quality research.’ This is now the subject of a flame war with the statisticians beating up on the psychologist surrender monkeys.

Another item in Nature advocated an attack on ‘sloppy science’ derived from abusive ‘tweaking’ of statistical results and called for a register of experimental design and analytics prior to publication.

Getting back to big data and published research I think that the most significant development in this space is the ‘reproducible’ approach as exemplified by the Madagascar open source seismic imaging movement which advocates the publication of both algorithms and data.

This approach could apply equally to large data sets with complex statistical deductions. Putting the data into the public domain would allow other researchers to check the logic. As AI plays a growing role in operations and automation, the usual argument of data confidentiality may be hard to justify, particularly when results are baked into safety critical systems.

Big data, small uncertain conclusions

Back from the SPE Digital Energy conference, editor Neil McNaughton wonders how much big data-driven, artificial intelligence can be trusted. Authors rarely follow the SPE’s own guidelines on statistical testing. Perhaps ‘reproducible’ research and publishing is the way ahead.

Click here to comment on this article

Click here to view this article in context on a desktop

Big data, small uncertain conclusions

Back from the SPE Digital Energy conference, editor Neil McNaughton wonders how much big data-driven, artificial intelligence can be trusted. Authors rarely follow the SPE’s own guidelines on statistical testing. Perhaps ‘reproducible’ research and publishing is the way ahead.

Sign up for occasional emails and subscription information...

Click here to comment on this article

Click here to view this article in context on a desktop