ECIM 2019 Haugesund

Shell – ML is happening but too slowly, SPDM launch, Teradata data science needs a scalable data system, Shell on OSDU, Okea and IO Data on big, tough data project, Pandion/Computas and the Kerigen subsurface data platform, Wintershall’s AMIE automated information extraction project, Schlumberger on the UK National Data Repository, Equinor on the ‘inadequate’ LAS well log standard, Petroware JSON Well Log Format (JSON-WLF), IHS Markit/PPDM and taxonomic clarity in the upstream, Sword/Venture - data science unlocks the value in BP’s unstructured data, Schlumberger’s damascene conversion to open source. Short takes: NorskOlje&Gass data exchange. AgileDD work for Equinor. Diskos 2.0. North Sea overlooked pay project. Geodata - ArcGIS front end to Volve dataset. Kadme Whereoil front end to Diskos. Interica on Woodside’s rule-based archiving.

Shell – ML is happening but too slowly

Marianne Oslnes’ (Norske Shell) cited the World Economic Forum as putting a ‘$1 trillion value*’ to the oil and gas industry over the next decade. But this bonanza is slow to materialize in the upstream. Machine learning in seismic ‘is happening, but too slowly from a business urgency perspective’. E&P is the least productive sector of the industry. Shell is therefore focusing on digital technologies that are reaching an inflection point and have impact – with blockchain at the front of curve! The upstream value chain is based on ‘analog, sequential work processes spanning decades’ with ‘thousands of fingerprints’. This is not competitive and unattractive to shareholders. A step change is required to end to end workflows that remove barriers and siloed thinking, ‘Even before with analog drawings it was faster than today!’ Well known data management principles ‘data as asset, data quality, ownership, metadata and so on) may seem old hat but ‘you need to do it again, maybe now you will have business leaders who will make the effort to listen and understand the value of the data. Remember though that others may not understand your perspective. So make what you do pictorial and simple. As ‘culture eats strategy for breakfast’, think about how to undo 20 year-old ways of working. Push the boundaries, understand and communicate the problem in hand. For data and information managers ‘this is you era’ help us change an entire industry you finally have our attention.

* Ken Dunn BP quoted the WEC ‘value’ as $2 trillion! (see our SLC report in this issue).

SPDM launch

Lars Gåseby formally launched the Society for Petroleum Data Managers, that sets out to encourage ‘lifelong learning, advanced KM and career development’ for petroleum data managers through ‘community, conferences and events and the sharing of reference materials and standards’. The SPDM currently has 135 members, most in the EU.

BP’s DataWorx organization

Robbie Watson introduced BP’s new DataWorx organization that is to ‘create $10 billion of cash, not perceived, value’. Data management and data science is ‘no longer just a job’, it is now a career. DataWorx is an environment to ‘thrive learn and fail in’. Previously data management was seen as a ‘little back office thing that no one talks about till it goes wrong’. Now BP has created the upstream data management framework in a global approach to drive consistency. Other majors are doing the same thing and there is competition for talent. One early result is improved production at BP’s non operated Angolan subsidiary where in under 7 weeks, BP developed an automated tool that removed need for manual data extraction. The ‘machine learning*’ tool advises on potential production opportunities and has led to ‘$25 million in savings’. The tool was developed by Ruairi Dunne, a recent grad who joined BP’s internship program. Dunne opined on expectations for the industry and his personal transition from a geoscience to a data/tech focus. ‘Would I enjoy a geoscience role more? Is there less prestige in data?’ So far, his journey has involved training in data science with PowerBI, Kaggle, Python and an intensive Coursera-based data science training. Following this came a signal processing and machine learning PoC in reservoir engineering, work with acoustic sensors in sand management. This has been delivered as a real-time operational dashboard (with Palantir) integrating production data with predicted sand events. The tool runs every hour, detecting sand events and adapting chokes.

* Our subsequent investigation suggests that the tool is less dependent on machine learning than on situational awareness. See our editorial.

Teradata data science needs a scalable data system

Teradata’s Niall O’Doherty stated that data management matters more than ever now to drive data science. The Harvard Business Review book ‘Prediction Machines*’, shows the economic impact of AI/ML that is ‘making prediction cheaper’. The financial service industry is getting excited about the technology. AI has the potential to ‘fundamentally change or eliminate parts of your industry’. One key to data science success is a scalable data system that helps move from POC and into production. Enter the [Teradata] agile data warehouse. Another aspect is people. Today we have ‘the wrong people doing the wrong jobs’. Data architects are not data scientists and vice versa. This has led to the ‘accidental data architecture/ecosystem’ where 80% of the time is still spent moving and preparing data. O’Doherty cited Hadley Wickham whose ‘tidy data’ structures are easy to work with and free analysts from mundane chores. As data science thought leader Andrew Ng put it in his seminal talk on the ‘nuts and bolts’ of applying deep learning, the unified data warehouse is key. Teradata is also keen on OSDU and ‘will support the initiative as much as possible’.

* See also the Prediction Machines website.

OSDU The Open Group..

Philip Jong (Shell) provided an introduction and update on OSDU, the Shell-backed Open subsurface data universe. Jong agreed that a data platform is essential but that the full benefits will not be realized if it is kept in house. OSDU therefore has set out to develop an industry-wide data platform. This will counter low productivity due to multiple, small data ecosystems. A meeting with Total and Equinor in March 2018 led to Shell seeding the initiative with the platform and code for its in-house developed ‘Subsurface Data Universe’. In August 2019 an OSDU demo release was made available on Azure and AWS. The demo used INT’s IVAAP well log data viewer, machine learning from CGG and NLP from Shell. All running against a 5,000 well data set from TNO. Mapping appeared to leverage Bing Maps. The overall plan is to separate data from the apps and to put all data on single data platform à la unified data warehouse. The initial scope is exploration/wells. Source code will be made available for cloud services. OSDU will assure end to end data support, management and information security. In 2020, the Schlumberger OpenDES is to be ‘merged’ into OSDU following an ‘overwhelming’ vote from the members. ‘This will be a game changer. A public API will allow access for small players and academia’.

Okea and IO Data on big, tough data project

Pål Andresen from Norwegian E&P startup OKEA teamed with Johan Kink (IO Data) to present a data loading case study on the transfer of the Draugen data set from Shell. Draugen was discovered in 1984 and unpacking of the diverse data set is ‘still in progress’. Seismic data presented multiple problems, a defective tar archive in Rode format required a bespoke repair program to decode. SEG-D in Rode format was likewise badly encoded. Some 3592 nav merge tapes were unreadable. All which required significant programming ability and seis tape domain knowledge. Data delivery and completeness proved problematic. ‘Even with NPD reporting rules, data will be underreported’. Shell maybe not have reported everything. Diskos is not all complete. The NPD could and should take a more active role in data transfer by making field data repository lists, signoff on data delivery and arbitration on data disputes. This will increasingly be an issue as majors hand over ops to smaller companies.

Pandion/Computas and the Kerigen subsurface data platform

Pandion Energy’s Kine Johanne Årdal, with help from Computas, has developed the Kerogen subsurface data platform. A cloud-based, AI-enabled upstream data platform that ‘heralds the new era of the augmented geoscientist’. Kerogen (rather like OSDU) sets out to solve the problem of technical and organizational silos which hamper collaboration across for instance, geochemistry and geophysics.

Wintershall’s AMIE automated information extraction project

Dejan Zamurovic presented the results of Wintershall’s AMIE (automated multidisciplinary information extraction) PoC. This has leveraged ATTIV/O natural language processing and data catalog, Tibco data virtualization, and AgileDD iQC. Extracted data from the document repository feeds endpoints including Spotfire, OpenWorks, Petrel and ArchiveDB. AgileDD performed better than DIY with Python at extracting well metadata from a scan of log. Training dataset is problematic in that different usages have crept in over 20 years. ATTIV/O NLP extracts ‘gas shows’ or ‘serious injury’. For data pairs like ‘vitrinite reflectance’, a value can also be extracted. Drilling depth progress can be extracted from a graph in a text. More from a Tibco blog on well drilling. Tibco data virtualization and a Modula plug and play data pipeline also ran. Costs are ‘non-negligible’.

Schlumberger on the UK National Data Repository

Michael Smith (Schlumberger) presented on the UK National Data Repository that launched in March 2019. This is ‘not exactly a new idea’, with CDA as a precursor since the mid 1990s. In 2018, CDA, Schlumberger and the Oil & Gas Authority decided to launch the NDR and decommission the CDA site. The NDR embeds Schlumberger ProSource with a web app and online/nearline storage, a GIS server and a secure FTP download manager. The system is hosted by Schlumberger outside Aberdeen from a ‘private cloud’. The NDR holds 12k wellbores online, 600k disclosed data items plus seismics and has 200 users per day with 4,700 users registered. The NDR represents a huge change for public access with ‘thousands more wells available than before’. An API is coming real soon now. You can get all UK well data for £20.

Equinor on the ‘inadequate’ LAS well log standard

Bjarne Bøklepp(Equinor) ironically wished the ‘inadequate’ LAS well log ascii standard a happy 30th anniversary (LAS was first published in The Log Analyst in 1989). In 2019 DLIS and LAS 2.0 remain the main exchange formats for well logs. (actually the Canadian Well Log Society issued an LAS 3.0 specification in 2000).

Petroware JSON Well Log Format (JSON-WLF)

In fact, Bøklepp’s presentation was a lead-in to Jacob Dreyer who unveiled the Petroware JSON Well Log Format (JSON-WLF). Well log formats (LIS, DLIS, LAS, WITS; BIT, XTF … ) are outdated, complex, and based on 1980s tape technology. They lack documentation, expertise is withering, and they are costly to maintain and use. Petroware’s business involves reading and writing logs in many legacy formats. Its LogStudio flagship uses an in-memory intermediate format offering maintainable, lossless conversion maintainable. Petroware is now proposing a persistent storage format derived from its internal protocol. JavaScript Object Notation is a non-proprietary standard with support for UTF-8 (Unicode), built in no value, good type support, Energistics UoM support and ISO 8601 date and time. Visit the web page with sample data from Volve all converted and republished to JSON in 50 lines of code. See also the GitHub repository. JSON-WLF data can be loaded to Petrel, Geolog and Matlab. The format ‘has huge potential, the impact will be massive when we get it rolling. We need your help – some homework for you. Pressure your DMs and others to accommodate this format. Standards orgs should embrace this technology’.

Petroware’s talk sparked off some debate. Both Schlumberger and Halliburton stated that they already used JSON internally for log data. Boklepp was queried about the role of the standards bodies and in particular of Energistics WITSML, surely a candidate for log data persistence? Energistics is waiting on a final version of JSON that accommodates multi dimensional arrays. Meanwhile some are waiting for OSDU to specify a standard. For Norway at least it appears that JSON WLF is a strong contender.

IHS Markit/PPDM and taxonomic clarity in the upstream

Elizabeth Patock (IHS Markit) addressed the issue of taxonomic clarity in the upstream with reference to the PPDM What is a completion work. There are many possible interpretations of what is important to a ‘completion’ and regulatory authorities differ on what is required with ‘consequences’. To alleviate such semantic confusion, PPDM advocates faceted taxonomies, hierarchical structures where instances of each facet can be unambiguously described to support interoperability. The PPDM WIAC taxonomy has branches for ‘business’ and ‘physical’ usage, with child facets going down to reservoir, activity, geologic and wellbore contact interval. There remain complex issues with deviated wells and awkward WBCIs and the mechanical interface facet (equipment) present more possibilities for confusion. PPDM is working to tie the facets to its eponymous data model.

* Well bore contact interval.

Sword/Venture - data science unlocks the value in BP’s unstructured data

Attila Balazs (Sword/Venture) presented work performed for BP on ‘unlocking the value in unstructured data with data science’. As data volumes explode, companies either have to organize everything upfront (which is hard to do) or just ‘accumulate stuff’. Data science offers a middle way using exploratory data analysis, data wrangling, model building, prediction and action. R used to be the preferred tool but Python is winning the competition. Pandas exploratory data analysis and SciKit Learn are ‘essential for any ML project’. The BP reference solution has subsurface documents on the server, content parsed and OCR’d and metadata (well names, field, companies) extracted. Documents are classified with ML and stored appropriately with rich metadata. Apache Tika is used for data scraping. Pandas has displaced SQL. The Spacy and NLTK natural language processors both got a mention. The solution underpins BP’s ‘Julien’ automated document management system that processes hundreds of documents and emails from Outlook/Aspera and populates NT Shares. A data harvester framework PoC was developed for the Azerbaijan unit, extracting static reservoir attributes from Office and PDF documents with Camelot and Tabular and feeding a PowerBI dashboard. Tempering the current enthusiasm for data science, Balazs observed that ‘A good data set beats algorithms. Simple regression may be enough’.

Schlumberger’s damascene conversion to open source

Jamie Cruise wound up the event with an enthusiastic presentation on the ‘tipping point’ in the upstream and on Schlumberger’s damascene conversion to an open source software company with its Open Data Ecosystem (OpenDES or more accurately the Delfi data ecosystem) first announced by the then CEO Paal Kibsgaard in July 2019. OpenDES is touted as the data environment that underpins Schlumberger’s Delfi ‘cognitive’ E&P environment with roadmaps for corporate stores; NDRs, extant and next generation products. ‘All of the things that we have been building into out silos over the years will migrate into the data ecosystem. Schlumberger noted work that Phillip Ng was doing in OSDU and realized that there was not really room for two data platforms and decided to contribute openDES to ODSU to accelerate both programs. For the skeptics, Cruise insisted that ‘our conversion to and understanding of open source is very authentic, running under Linux Foundation rules. ‘What is being shared is not skeleton code but the real thing as used by us with core services, data flows, optimized storage and domain data management services. This is the same Delfi code as demoed to our clients. OpenDES is configured for Devops and we made the first commit in Git at a ribbon cutting ceremony in Monaco. Now we are going to learn how to make this open source stuff work as a community. The future of data is open!’

Short takes: NorskOlje&Gass data exchange. AgileDD work for Equinor. Diskos 2.0. North Sea overlooked pay project. Geodata - ArcGIS front end to Volve dataset. Kadme Whereoil front end to Diskos. Interica on Woodside’s rule-based archiving.

NorskOlje&Gass is a regrouping of three Norwegian quangos (GeoTrade, EPIM and legacy NOG) with a history (some would say form) of developing standards for upstream data sharing. NOG is now working with ConocoPhillips, Equinor, Shell and Total on a minimum best practice data set for daily information exchange. The new platform and APIs are about to be released on Azure. The NOG data solution now uses the GraphQL query language.

Henri Blondelle (AgileDD) presented work done for Equinor on extracting the rich (and often unindexed and unused) content present in composite logs. Successive training of lithological descriptors, depth pixels, shows and geological descriptions was performed with YOLO convolutional neural net computer vision software and AgileDD’s own iQC. In fact, the trials were carried out with the entry-level TinyYOLO app, and a ‘light’ IT environment of 5x GPU GTX 1060. Training data was generated by ‘human and heuristic tagging’ of composites. Lithology proved a hard task, show symbols were easier to detect. TinyYOLO was OK for the trial but better results are expected using a larger GPU farm and YOLO’s full implementation of the R-CNN algorithm.

DISKOS 2.0. The current DISKOS contract expires year end 2020. A tender is out but NPD has the option of extending the current contract for an extra three years. As part of its Released Wells Initiative, the NPD, via DISKOS, is seeking to revitalize old data and to digitize cuttings and make the data shareable through Diskos. Stratum/Rockwash have been selected as vendors for the project.

Gillian White from the Aberdeen-based Oil & Gas technology Center floated the Northern North Sea machine learning in exploration overlooked pay project. The idea is to use well data (not currently seismics) from some 1,200 exploration wells and some 6,000 development wells. Logs, core data and reports are available for the study. The project was first announced in 2018.

Erlend Kvinnesland (Geodata) showed how the heterogeneous released Volve data set has been mastered with Esri’s ArcGIS. The Geocap Petrel plug-in for ArcGIS allows for 3D seismic data to be viewed and manipulated in ArcGIS while the production data can be visualized with the ArcGIS operations dashboard. A compelling alternative to unpacking the Volve data with the original software used to create it.

Jesse Lord showed how Kadme’s Whereoil API was used, in conjunction with RoQC Tools, to combine data from multiple sources to enable real time data validation using the most recent NPD and Diskos data, directly from within Petrel. Troika’s Marlin seismic data trawler was used to scan, discover, QC and, if necessary, repair data. ‘Actual and correct’ metadata was extracted directly from the data and stored in the Whereoil index. All the data is now searchable with Whereoil and mappable with the Whereoil Map*.

* We were curious to know more about Whereoil’s mapping technology. Kadme kindly provided the following. ‘We store the spatial data in Elastic, and then we use the GeoTools libraries to manipulate the spatial features. The map interface itself has been built using OpenLayers.This is Kadme’s GIS system with a few man-years of work in it. The end user does not need any third-party licenses to run it. Everything comes with Whereoil.

Chris Bearce presented Interica’s work for Woodside on automated rule-based archiving across multiple data sources, a component of Woodside’s subsurface data transformation program (SDTP). Interica uses Microsoft Azure AI to geotag its datasets, leveraging its substantial training data derived from existing connectors. The geolocation catalogue includes confidence indicators derived from file path, file name, content, well name and other indicators. Woodside’s SDTP is to declutter and prioritize its online projects and reduce disk storage taken up by old Petrel projects. The solution leveraged Interica’s PARS, ARBA, a Petrel connector and open API’s for integration with the global GIS system. The solution is integrated with AWS S3 for long-term storage.

The 25th ECIM is scheduled for September 14-16 2020.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.