Speaking at a recent ‘big data in geosciences’ event held by the London Geological Society, Paul Cleverley (RGU, Aberdeen) showed how natural language processing (NLP) can be applied to ‘mine’ large bodies (a corpus) of geoscience information. Cleverley contends that the approach can challenge cognitive bias* or corporate organizational dogma, stimulate creativity and lead to a learning event and ultimately, business value.
Cleverley proposes adding automated ‘sentiment analysis’ to text mining as practiced in opinion and brands analysis. Such techniques need adapting to work in the geoscience field. Cleverley has developed Gazer, a geoscience-aware sentiment analyzer that uses an ensemble machine learning approach. Gazer was developed using the open source Python TextBlob to ingest public domain petroleum system assessment reports. Geologists then tagged some 1,000 phrases (‘source rock’, ‘reservoir’, ‘trap’ ...) Astonishingly, the geologists agreed on over 90% of the definitions.
Gazer ran against a test corpus of 750 labelled documents and returned 84% accuracy, ‘approaching human-like levels.’ Cleverley is to present the results again in the Halliburton iEnergy lecture series. More too from Cleverley’s blog.
* Provided of course that the bias is not baked into the process. Remember Tay, Microsoft short-lived racist sex-bot?
© Oil IT Journal - all rights reserved.