Hip hip Hadoop!

Rice University Oil and Gas HPC presentation by Prairie View AandM researcher adds machine learning to seismic interpretation. ML-driven fault extraction results ‘promising.'

Speaking at the 2016 Rice University Oil and Gas High Performance Computing in Houston this month, Lei Huang, assistant professor at Prairie View A&M University, showed how an open source software stack has been deployed to add artificial intelligence to seismic interpretation. Huang and co-author Ted Clee (TEC Applications Analysis), have used ‘deep’ machine learning (ML) to identifying geological features in seismic data volumes. Huang’s latest work builds on the Prairie View seismic analytics cloud, a Spark/Hadoop-based seismic processing infrastructure. Huang observes that the massive datasets that are current in 4D seismics are of a different scale and nature to those that are commonly leveraged in social media-based big data work.

Huang’s SeismicRDD, a derivative of Berkeley’s resilient distributed datasets, brings parallel, in-memory computing and provides a mechanism for exposing multi petabyte datasets to analysis. The AI/deep learning component is provided by the Deep learning for Java toolset. A constellation of other open source tools add stream, batch and interactive capabilities, a NoSQL database and routines for seismic data loading and partitioning across the cluster. An earlier presentation from Huang’s team demonstrated the system’s capability in seismic processing where tools for data management have been developed, along with seismic transpositions and filters. The cloud-based system offers a web-based front end that can be programmed in Java, Python or Scala. Tests on a 288 core cluster show good scalability.

The machine learning component has been applied to the identification of geological faults. Here Huang uses Dave Hale’s seismic image processing for faults (IPF) package to calculate an attribute of fault likelihood, strike and dip. IPF’s image thinning techniques smooth seismic data along reflectors, enhancing discontinuities. IPF is already in use in commercial packages.

Huang’s approach combines multiple data sets (curvature, amplitude envelope) along with a training data set of ‘known faults.’ The ML-derived fault detection is said to show ‘encouraging results,’ although the ML ‘meat’ of Huang’s exposé was rather glossed-over. The techniques used included logistical regression and a support vector machine. Here, Apache Spark was key to speeding processing of the whole dataset, down from days on a sequential machine to hours. Visit the Cloud Lab where Huang is building a scalable big data analytics cloud with sponsorship from the US National Science Foundation. The lab focuses on Apache Spark and Hadoop big data with an emphasis on developer productivity and scalability.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.