Jess Kozman (Woodside) and Matthew Holsgrove (Wipro) presented Woodside’s project to ‘accelerate seismic data cloud ingestion through machine learning and automation’. Woodside set out to maximize the benefits from its seismic investment with a move to optimized cloud data storage. The authors report that ‘there is currently no available COTS solution that supports the whole of the optimized workflow’ and so a bespoke solution was implemented with help from Wipro, embedding existing vendor cloud-native technologies and services where available. Woodside wanted to ‘liberate’ data from proprietary application formats and deliver workstation-ready seismic data to geoscientists. A ‘cloud-first’ approach means that all Woodside tenders for acquisition surveys and processing projects now specify that data must be delivered to the cloud. Woodside has implemented a machine-learning driven automated workflow for uploading data to the cloud. Bluware’s Open VDS compressed seismic data format and data streaming capabilities is used to reduce the storage footprint and optimize delivery of data to consumer applications. The Open VDS format has been adopted by the Open Subsurface Data Universe (OSDU) industry consortium. Woodside’s PPDM/Oracle master data repository is linked to the system through an application programming interface.
A joint presentation from ExxonMobil’s Audrey Reznik and John Archer (RedHat*) covered the automation of analytics and data pipelines. Exxon was looking to make the results of its researchers’ efforts more accessible while preserving their stability. For a data scientist, ‘the ability to rapidly deploy code and quickly obtain feedback from a user is extremely valuable’. Enter RedHat’s OpenShift containerized Kubernetes platform. This has allowed Exxon to manage and deploy Jupyter notebooks, the tool of choice for combining python code, a GUI and documentation for sharing research tools with geoscientists and engineers. The RedHat/Exxon proof of concept (PoC) sets out to create a reproducible and interactive data science environment with user interaction from a Google Chrome browser. The PoC trialed the use of a cloud-based GPU cluster to read and analyze log data. ML-derived analytical models developed on the cluster are pushed to Azure for deployment in what are describes as ‘emerging data science ops workflows’. The presentation slides are a window into a modern data analytics pipeline with more Kafka/Hadoop/Spark et al brands than you can shake a stick at! Those interested in testing the OpenShift environment should sign up with the OpenShift Commons ML SIG and grab some JupyterHub OpenShift templates. More too from the RedHat OpenDataHub project.
* A wholly-owned IBM unit.
Another data pipeline specialist is BHP’s Richard Xu who is developing an AI-based data pipeline, expanding traditional data management. Xu is addressing bulk/batch data issues such as those involving legacy data process, integrating data from new ventures or prepping data for divestment. The approach is to use an iterative/agile development process involving proofs of concept on increasingly large datasets. The pipeline involves text analysis with natural language processing, taxonomy creation, word to vector conversion and machine learning. The Google/TensorFlow Word2Vec tool is used to parse documents prior to cross-validation with a taxonomy-based classification. Scanned documents are also processed and geotagged for spatial classification. Xu reported a 40% time reduction and a 30% cost reduction. On accuracy, Xu told Oil IT Journal, ‘Because there are thousands of data types and a lot of noise, text classification target is 70%. Our test result is 90%. Because the training sample is small, the approach combines AI and taxonomy. The worst data types give a 70% success rate, the best, 90%. We perform multiple iterations and sprints, moving low accuracy results into the next iteration. The project has been in action over a year. It has gone through vision, PoC, development and now it is in production mode. Current efforts are more concentrated in fully parallel AI in big data platform in order to improve performance.
Equinor North America’s Justin Frost teamed with Andrew Timm (Triton Data Services) to report on a major seismic-to-the cloud project. Previously, Equinor North America (ENA) operated a revolving three to seven-year procurement cycle for seismic data storage, often involving migration to a new database. To break the data migration cycle, ENA elected to use a cloud-based seismic archive for its petabyte-scale dataset. The contract was awarded to Triton Data Services whose TerraStor is now configured to utilize cloud storage via Amazon AWS. Each trace was scanned, QC’d and its metadata captured to TerraStor. Archive data was encapsulated and transported into archival storage in AWS. Most data was uploaded over the wire but for some super-large marine datasets, Amazon’s ‘Snowball’ 72 terabyte network-attached storage devices were loaded and shipped physically for ingestion. Another interesting facet of the project was the possibility to comply with certain jurisdictions’ export controls that preclude export of seismic data. The solution here is to geolocate data to a cloud environment in the country. Metadata in TerraStor is also replicated to Equinor’s internal seismic entitlement database, providing an abstraction layer to physical storage and enabling future data transfer to other cloud providers.
Max Gray presented components of ExxonMobil’s ‘digital transformation’ in the form of three projects that set out to leverage Exxon’s massive legacy data archive. Exxon has some 12 million geoscience and engineering paper documents in storage in the Houston area. Following a pilot, a contract was awarded to L&T Technology Services* for metadata capture and enhancement and document scanning. All documents are georeferenced and depth registered as appropriate. As of Q1 2019, some 1.4 million documents have been processed. The next step is to migrate the documents into the ExxonMobil cloud storage account, with access provided via Katalyst’s iGlass data manager. A parallel workflow addresses other physical media (non-paper) such as CDs, DVDs, thumb drives, stored in the hardcopy record boxes. Gray outlined another Exxon digital transformation project involving the capture of ‘orphaned’ data and interpretation results in legacy applications and systems. A proof of concept trail on some 172 projects found that interpretation data could be extracted at a cost of around $4,000/project. Further price reduction is expected as the work is to be tendered to external service providers. Some 18,000 projects have been identified for data harvesting. The endgame is to develop a ‘managed service provider’ factory migration workflow to ‘harvest data at scale’. Finally, Exxon is also looking to remaster its humungous legacy seismic data of almost 2 million tapes, currently stored in an underground salt mine. Following a pilot, Katalyst is now processing some 30,000 tapes/month. The project is expected to take 4-5 years. As of May 1, 2019, Katalyst had processed 108,000 tapes, recovered 750 TB of data, placed it online and made it accessible through iGlass. Retrieval times are down from weeks to hours.
* LTTS is a publicly listed subsidiary of Larsen & Toubro Limited, an ‘$18 billion Indian conglomerate’.
Kelli Witte and Gina Lawson presented ‘Pegasus’, Chesapeake’s project, play and prospect inventory management application that stores and manages the consolidated efforts of explorationists, managers and reservoir engineers. Pegasus provides technical guidelines and milestones for the delivery of technical products, setting data integrity standards and enabling collaboration, documentation and knowledge transfer. The solution replaced a suite of spreadsheet tools that had become ‘ineffective and obsolete’, limiting the speed at which the relevant information could be collected and managed, and hampering decision making. Pegasus deploys a ‘centralized database platform’ that captures play level/assessment data to be captured from multiple disciplines. Today some ten internal administrators maintain the application. The tool is said to be robust in the face of changes in other interpretation tools. One component of Pegasus, Origin, is based on 3-Gig’s Appra, a packaged solution combining 3-Gig’s Prospect Director with Rose & Associates software suite.
Andy Flowers presented ConocoPhillips’ citizen data scientist program (CDS), one of several ways that the company is benefiting from data science. The CDS sets out to train subject matter experts in data science with the expectation that this will lead to data-driven solutions to problems that are not amenable to conventional methods. Teaching includes the use of standard workflows for data extraction and conditioning, statistical techniques to describe data features and significance and the derivation and use of computational models from data. If interested, CDS graduates will be offered positions in the advanced analytics team to apply data science to support other business units and functions.
Comment: In the 19th Century gold rush, it was said that more money was made by those selling shovels than the miners. Could it be the same for the data science training business today?
Baptiste Joudet, Elie Maze and Florian Bergamasco teamed to present Total’s DataLab, an ‘innovative approach to data management’. Total created its DataLab in 2018 to address a growing number of incoming documents (+20% year on year), with a new approach to data management and data automation. The vision is to provide Total’s data management community with self-service solutions to automate data loading into business applications and corporate data bases. Initially the lab is working on early stage data management of incoming files. This involves sorting and classifying files received, achieved by the automated extraction of key information from file headers, pictures and reports. A first pass classifier sorts files into data family (seismics, well data, geochemistry) and data type. This allows files to be routed to the appropriate workflow. Here a machine learning-derived process extracts metadata (company names can be derived from their logo), performs geotagging, identifies data in tables and extracts images. A 95% accuracy level is claimed for the classifier. The solution is enabling optimized search across the full content of reports, including imagery. The authors conclude that artificial intelligence brings many benefits to the data manager. But AI needs a large volume of clean and organized data. Here good procedures, business knowledge and quality databases are key to successful AI. For more on the early days of Total’s data lab read our report from Octo’s event.
Dirk Adams presented Kadme’s work for YPF on the management of physical and digital data in Argentina in an Azure environment. Kadme has tailored its Whereoil application to crawl YPF’s data sources and fulfil the company’s business and operational requirements. The solution was retooled to run in the Microsoft Azure cloud, benefitting from Microsoft’s presence in Buenos Aires and YPF’s Azure contract. The choice of Azure was also dictated by the fact that no other major cloud provider has an Argentinian presence! The project has involved the migration of some 2 million documents and associated metadata. The system also manages check-out and returns for physical assets. Whereoil’s ‘Insight Engine’ adds a search function to the Azure data lake and a geotagging function adds location to document metadata. The system is currently in final test phase before rolling out to ‘over 1,000 users.’ Kadme acknowledged assistance from local partner Innovisión.
Adam Serblowski (Shell) explored the potential for robotics as a source of analytics data. Following early use in deep water operations, robotics are now moving from specialist applications to become ‘an integral part of daily operations’ in many oil and gas contexts. Robots bring new data management challenges as they generate data volumes that are orders of magnitude more than a human worker. Serblowski presented a 2018 proof of concept that used drone technology to monitor Shell’s US unconventional operations. The PoC involved the surveillance of 60+ wellsites using a beyond visual line of sight (Bvlos)-enabled drone. A single wellsite flyby generates over 2GB, i.e. around 120GB per day for a full inspection round. Data management and the image processing pipeline needed some serious planning. The POC trialed two types of change detection, RGB and state change. The former was effective at highlighting defects such as debris on a wellsite. State change proved capable of tasks such as estimating volume of liquid inside of a tank. Both approaches are viewed as pre-processors for further analysis by an operator. Serblowski concluded a viable solution involves trade-offs between cost, data volumes and inspection scope, but that ultimately, the approach will become the de-facto way of working.
Blue Marble’s Kris Berglund warned of geodetic changes coming in 2022 in a reprise of a talk given by NOAA’s Dan Martin earlier this year. In 2022, the National Geodetic Survey will be replacing the US horizontal and vertical datums (NAD 83 and NAVD 88) with a new International Terrestrial Reference Frame ‘gold standard’, ITRF-based framework. The continent will be divvied up into four plate-tectonic-fixed terrestrial reference frames, each with its own time zero ‘epoch’ and relative motion. For more on the changes and on the history of US geodetics, read Martin’s original presentation here.
More on PNEC from the conference home page. Next year’s edition is scheduled for May 19-20, 2020.
© Oil IT Journal - all rights reserved.