2021 Rice University Oil and Gas High Performance Computing Conference

Panel session on next five years of HPC. UT Austin: Optimization of field-scale geological carbon sequestration with IBM’s Bayesian Optimization Accelerator. Fraunhofer automates expert geophysicists’ tasks with machine learning. RocketML’s DeepFusion ML for 3D seismic facies classification. Microsoft/Imperial College London: cloud-native 3D full waveform inversion. Rice’s SEDSI: from raw seismic to geologic model with natural language processing! Total’s GEOSX CO2 sequestration simulator. AquaNRG: cloud-based HPC for digital rock chemistry/physics.

Speaking on the “Next Five Years of HPC” panel, Katy Antypas (Berkeley Labs) called for more connectable HPC systems to allow complex workflows to span different hardware with specific capabilities. Optimizing end-to-end workflows should be the aim rather than building a massive binary application. Intel’s Dan Stanzione was philosophical, ‘In five years we will still be fighting heterogeneity, and there will not be a single programming language’. There will be more object stores, but ‘Posix isn’t going anywhere’. One ‘very problematic’ trend will continue, the ‘fork between the exascale crowd (versed in heterogeneous architectures, C/C++, HDF5 and storage hierarchies) and everyone else who just use Python!’ Andrew Jones (Microsoft) saw the cloud as the ‘future supercomputer’. In five years we will be evaluating a significant new technology (quantum computing?). Shell’s David Baldwin placed HPC in the context of the energy transition. Seismics will continue but will be a smaller part of the HPC pie as we see new uses and applications with disparate and even conflicting requirements. Data sizes will continue to grow and data management and movement will challenge, requiring careful choices.

Mary Wheeler (UT Austin Center for Subsurface Modeling) presented on Bayesian optimization for field-scale geological carbon sequestration. With help from IBM, a framework has been developed for blending high fidelity geological models with machine learning techniques. Wheeler traced current CO2 sequestration initiatives from ExxonMobil’s $3 billion test, Oxy’s Midwest CO2 ‘Superhighway’ and the Illinois CCS project. CCS modeling presents a range of challenges. To model fluid migration and interactions, cap rock integrity and more, scale-variant, non-linearly connected, space-time dependent grids are required, incorporating geochemistry and geomechanics. Enter IPARS, the integrated parallel, accurate reservoir simulator, the CCS simulation workhorse. IPARS embeds components from CMG and others, notably IBM’s Bayesian Optimization Accelerator (BAO). Here, the true analytical functions are replaced with surrogate Bayesian models whose predictions inform subsequent workflow steps, making for a more computationally efficient system. Parallelizing the approach is both necessary and hard which is where BOA fits in, running in the IBM cloud. The approach was demonstrated using data from the Louisiana Cranfield CCS site with a 60% reduction in compute time.

IBM’s Chris Porter and Martina Naughton showed how the Bayesian Optimization Accelerator (BOA) is used to find the ‘ideal solution at lowest cost’. BOA is delivered as a stand-alone appliance along with its own API and GUI, located alongside the HPC environment. BOA is delivered on either x86, Power or ‘anything else’ architectures. According to IBM, BOA is used for ‘hyperparameter optimization’ and automatic machine learning, ‘two applications among many, BOA is a machine learning algorithm in itself’. More on BOA from IBM .

Ricard Durall (Fraunhofer Institute) presented a generative model for transfer learning on seismic data. Progress in machine learning has meant that it is now possible to automate tasks that were previously only doable by expert geophysicists*. Applications include fault picking, horizon detection and salt body segmentation. Machine learning methods traditionally fall into two camps. Those derived from human-labelled real data and those trained on synthetic models. Fraunhofer’s approach is to use generative adversarial networks (GAN) to enhance and automate labelling the training data. The system generates artificial faults and diffractions. The approach is said to outperform standard methods that use pure synthetic data.

* This is something of a bold claim as automated interpretation has quite a long history, dating back much earlier than the current AI/ML boom.

Sergio Botelho (RocketML), with researchers from Shell and Rice University, presented on 3D seismic facies classification using distributed deep learning. His starting point was the observation that seismic facies classification is a 3D problem and hence is not amenable to 2D approaches. Using a 3D CNN is however extremely compute-intensive. Enter distributed deep learning, a.k.a. DeepFusion*, demonstrated on the SEG’s Parihaka 3D dataset. DeepFusion supports massive networks on large-cale CPU/GPU clusters. It has shown ‘excellent strong scaling of 3D seismic classification problems on massive networks’.

* We asked RocketML’s Vinay Rao for more on DeepFusion. He replied ‘DeepFusion is a RocketML-developed distributed deep learning framework that abstracts out the complexities of HPC, purpose-built for solving large scale machine learning problems. It is particularly useful when the data resolution sizes, model sizes, batch sizes are too big to fit into the compute memory footprint. DeepFusion works on a "cluster of computers", be it GPU or CPU, out of the box without any customization, special code or knowledge of HPC and Cloud services. Users can focus on solving a machine learning problem vs. struggling with HPC hardware+software infrastructure. Many applications in the Oil and Gas industry are compute-intensive, including seismic-related, and can benefit from RocketML DeepFusion. This technology is available to customers who subscribe to RocketML SaaS product.’

Qie Zhang (and others from Microsoft and Imperial College London) demonstrated a cloud-native approach for 3D full waveform inversion on Microsoft Azure. The ‘hyperscale’ seismic imaging toolset, built on Docker, Kubernetes and Dask and programmed in Python and the open source Devito. Dask is a scheduler for parallel programming in Python. The Devito FWI code runs on multiple CPU/GPU architectures from Intel, AMD and Nvidia and was demonstrated on 256 Azure virtual machines. HDF5-based lossy compression allowed for a 15x reduction in the data footprint. Tests were performed on the SEG/EAGE Salt Overthrust open dataset.

Zhaozhuo Xu and Aditya Desai (Rice, working with Shell) presented on ‘Beyond convolutions - a novel deep learning approach for raw seismic data ingestion’. The idea is to shorten the traditional workflow of seismic processing by going straight from raw data to the subsurface model, reducing processing times from months to minutes. Current data-driven research treats seismics as image data. This is a ‘sub optimal’ approach as raw seismic data is ‘at least’ five dimensional. Unsurprisingly, such methods have not been successful. ‘Raw seismic data is not an image and should not be processed as one’. The authors instead propose an approach that is more akin to natural language processing. Enter SEDSI (set embedding-based seismic data ingestion). SESDI breaks down large-scale prediction into a small auxiliary task that ‘gracefully’ incorporates data irregularities. SESDI is claimed to be the ‘first successful demonstration of end-to-end machine learning on real seismic data’.Read the SESDI paper here.

Comment - One question that arises from an approach that removes geophysical ‘smarts’ from the processing workflow is survey design. Geophysical survey is the first step in a workflow that is tuned to both the geological problem and to subsequent processing steps. If these are replaced with AI, who designs the survey and with what in mind?

Hervé Gross presented GEOSX, the Total-backed open source CO2 sequestration simulator. CCUS is said to be ‘at the limit’ of existing simulators’ capabilities. GEOSX blends fluid flow simulation with geomechanics, HPC R&D and was developed on use cases supplied by Total. HPC innovations include the RAJA / LvArrayperformance portability framework and Chai, a ‘copy hiding application’.

We asked Gross what if any was the relationship between GEOX and the NETL’s ‘CCSI’. He replied, ‘There is no relationship between GEOSX and the CCSI-toolset released by NETL. The NETL toolset aggregates a number of utilities for carbon capture, whereas we focus on geological carbon storage. Their most dynamic repository (FOQUS) is a Python-based platform for optimization and uncertainty quantification. FOQUS seems to be a very flexible “pegboard” where you can easily create workflows by tying various pieces of software together, but all are related to engineering for carbon capture. Flexibility seems to be their selling point for solving surface engineering optimization problems. We have a different objective: we simulate geological CO2 injection in large formations. Simulating such subsurface phenomena requires multiphysics modeling that is not numerically tractable without a scalable design’.

Comment In any event, CCS modeling is difficult, as we concluded in 2015: ‘Numerical evaluations of CCS projects in saline reservoirs showed that it is very hard to find a target that matches all of the desired parameters. In general, sequestrable volumes shrink as long-term migration risk to aquifers and caprock integrity concerns are considered.’

AquaNRG’s Spatika Iyengar showed how cloud-based HPC is used in digital rock chemistry/physics. AquaNRG’s reactive transport models solve multi-phase flow, solute geochemistry and biogeochemistry. A web application runs multiple pore scale models to obtain relative permeability and capillary pressure saturation relationships in what is described as a ‘digital twin for special core analysis (SCAL)’. The cloud architecture includes Lambda functions, DynamoDB, S3 Buckets and Step functions in a Dockerized implementation of a ‘continuous integration/continuous deployment' strategy.

Next year, for the 15th edition, Rice is to change the name from the Oil and Gas HPC Conference to the Energy HPC Conference. A sign of the times…

Watch the Rice HPC in Oil and gas presentations on Youtube.

An aside … In an interview that appeared in the SEG’s First Break, seismic luminary Oz Yilmaz was asked what he thought of ‘digitalization’ in geoscientific work? He replied that ‘There has been a strong push to apply AI and its variants - ML, DL, CNN - to solve difficult problems in exploration seismology, thanks to highly influential propaganda from the high tech companies of the Silicon Valley. With regards to AI’s applicability to problems in seismic data processing, inversion, interpretation and integration of diverse geoscience data, it is in the latter case, and to some extent in seismic interpretation, that AI methods have been rather successful. Whereas problems in processing and inversion really require natural, not artificial intelligence. I base this on my experience in testing the AI algorithms for these two categories.’

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.