Highlight of the 2019 ECIM data management conference was Equinor’s announcement of a Digital Subsurface Laboratory (DSL) (DSL) which is to drive digitalization in Equinor’s subsurface domain by ‘applying AI and data science technology to create value from key subsurface datasets.’ At the heart of the DSL is Omnia, Statoil’s cloud-based data platform that is to run as a ‘subsurface data lake’ in the Microsoft Azure cloud.
Along with the push to the cloud, Equinor, Teradata and others signaled a renewed focus on data quality as a prerequisite for AI/ML. Another apparent trend is the consecration of open source software for the corporate environment, with a keynote from Agile Scientific’s Matt Hall and endorsement of open source tools by Teradata (a technology partner in the DSL) and others. Shell’s Lars Gaseby announced the creation of a new ‘Society for Petroleum Data Managers’, based in Stavanger. The SPDM has support from ECIM, CDA and EU major oils. Finally, ECIM is a dense, information-packed venue with several parallel sessions. We will be reporting more from the 2018 event in our next issue.
Tina Todnem (Equinor) and Duncan Irving (Teradata) presented Equinor’s subsurface digital transformation. Digitalization is said to have a $2billion potential contribution to Equinor through drilling automation, subsurface illumination, production optimization and more. Usually, such a transformation takes ‘5 to 10 years’ to show value, Equinor plans to speed this up by running its Omnia data platform as a Microsoft Azure subsurface data lake*. Microsoft’s machine learning and analytical toolset will be deployed across the E&P data spectrum. To date, prospects and fields are studied with ‘large, complex grid-based models’. Equinor wants to add more empirical data to the mix, ‘not losing the model but adding data-driven, computer assisted optimization’, a ‘hugely multi-disciplinary task’. The idea is for an integrated decision-making framework that embeds the carbon footprint of oil and gas development.
The tricky part in building the ‘digital subsurface superhighway’ is data prep and obtaining good training data sets. This relies on combining open source tools, data standards, APIs and tacit information. The initiative involves collaboration between data managers, data engineers, data scientists. Equinor is a ‘huge fan’ of open source software as witnesses by the ‘OPM’ reservoir simulator.
Teradata’s Duncan Irving provided some more details on the Digital Subsurface Lab, a ‘virtual’ joint venture between Teradata, Equinor and others. The DSL is to apply data science to optimization. But what to optimize? Current methods focus on an individual well or field by optimizing a few KPIs. But ideally, we should look at a bigger picture that embraces incentivizing personnel, safety and maximizing production. Enter the DSL’s ‘system of systems’ spanning subsurface, HSE, facilities and portfolio management.
One current issue is the fact that data management in the upstream does not yet have the right skill sets and fails to provide data in a usable format and of suitable quality. Irving advocates once and for all ‘proper’ management of data and metadata that will support ‘analytics on data we can trust’. Areas in need of attention include standardized vocabulary, file-level data validation and log edits and corrections. All of which must be done before machine learning can work. Quality comes when data management meets domain expertise. The DSL operates in 3-6 month sprints to minimum viable products. The DLS is also working with Energistics, Halliburton’s Open Earth and the recently announced Open Group’s Open Subsurface Data Universe.
* But as we observed in our last issue, data residency issues have meant that the data lake is to be constructed in Norway.
Troika’s Jill Lewis picked up on the data quality issue with reference to the SEG’s quality-targeted standards for seismic data. Automation, machine learning and data science may be sexy, but ‘nothing can be achieved without standards’. For too long, seismic workflows involve editing and futzing with seismic data. It is time to change and standardize. Now all are on board in principle. However, contractors still lay down the law with their proprietary formats which hamper the application of groundbreaking developments in data science. At the workstation, similar proprietary environments (Petrel, OpenWorks, Kingdom) make the cost of integration huge. The Latest SEG formats are machine readable. Users should be knocking on the SEG’s door but in fact, SEG-Y R1 has seen ‘almost zero implementation’. Lewis and others (notably Stanford’s Stuart Levin) have provided machine readable SEG-D and an upgraded SEG-Y R2 to accommodate output from SEG-D. But yet again, people did not adopt making Lewis ‘very upset’. Even users of SEGY R2 tweak the header’s metadata in idiosyncratic fashion. ‘So now I get very angry!’ Seismics is at the heart of most oil and gas big data machine learning activity from finding patterns in data to automating analysis of well logs, production data and life of field passive seismics. Lewis wound up citing a recent IEEE article on ‘Why the future of data storage is still magnetic tape.’ Mag tape, by the way, is used extensively in the cloud.
Henk Tijhof reviewed Shell’s 10 principles of data management, first published in 2003. Today, only the ‘legal compliance’ principle is showing green. This led to a change of direction in Shell’s technical data management and a renewed attempt to ensure that next generation systems work as advertised. Here, standardization is the ‘next innovation’. Meanwhile Shell has disbanded its central data team and combined data management with operations.
In the big data arena, Tijhof reported that Shell’s experience to date shows that the focus needs to be on data streams not on the data lake. Today, the aims of Shell’s data managers are to ‘eliminate, simplify, standardize, automate’ (ESSA). This is done through data-driven workflows, eliminating silos and using workflow consultants to improve business processes. Shell’s LEAN-inspired ESSA was presented back in 2011 at APQC.
Laura Frolich and Larissa Glass (Teradata) outlined some of the projects they have been working on in the Equinor Digital Subsurface Laboratory (DSL). One direction is to eliminate or at least minimize data prep. Typically, this involves leveraging data from real time MWD/LWD systems alongside applications such as Oilfield Manager (OFM) and data from the Norwegian Petroleum Directorate (NPD). Other generic data sources also ran … Excel, Ascii and more proprietary formats. The data science tenet here is to consider raw data as ‘immutable’, that which must be recorded as is and not touched. This can be stored along with process workflows that generate reproducible results. Problems arise still with data quality – missing/wrong values, naming conventions and poor documentation.
Current ‘smart’ workflows in Excel may be popular, but quickly become unmanageable as one error is fixed and now there are two spreadsheets. There is a big difference between what can be done in Excel and what will be done in Excel. Data science can be used to figure out what’s happening in Excel and replicate it in Python. Ten lines of Pandas dataframe Python code can combine OFM data with NPD metadata for ingestion into a predictive model. The LAS-IO utility also ran. The workflow adds standard column mapping and naming in a data cleansing pipeline prior to output in a ‘configured ADS*.’ For those still keen on Excel, data export in CSV remains an option. The DSL aims to stop reinventing wheels through agile techniques, parsing, templates, QC, profiling, analytics and visualization. The aim is for reproducible results and a reusable, trusted code base. Unit tests, documented APIs and inline comments also ran. Moreover, agile supports remote work, there is ‘no need to co-locate’. The DSL team is spread around the EU. Finally, yet another take on the data lake metaphor. The authors see the DSL as ‘a water conduit as opposed to a leaky bucket.’
* An analytical dataset, a Python data frame for analytics, a half-way-house between raw CSV files and the repository.
Comment – It seems as though data science is a return to the scripting environment of the late 20th Century, with Python stepping in for the Unix shell.
Paddy Dineen provided more details on Schlumberger’s new data ecosystem for its Delfi cognitive environment. Dineen observed that contrary to the received view, management never bought into the ‘data has value story’. This is (hopefully) set to change with greatly improved access to relevant data, workflow automation and a ‘unified understanding of risk including HSE’. Currently, key decisions are based on and/or captured in email and PowerPoint rather than in a geoscience application. This is changing with ‘digitalization’ machine learning and cloud-based workflows. Dineen stated confidently, ‘the cloud(s) are secure’, particularly with added security from specialist providers such as CarbonBlack and PaloAlto Networks. Dineen cited ThoughtWorks position paper on ‘polycloud’ solutions that pick and mix services and solutions across Amazon and Google.
Delfi, Schlumberger’s ‘cognitive’ E&P environment will leverage Google for its data ecosystem while its ‘cognitive’ environment spans Google and Microsoft Azure. The Apigee API for polycloud data flow got a mention. Dineen further opined that ‘inter-cloud replication might be a way to optimize’ data use. The data lake mandates immutability. This is hard to achieve on premise, which is where the extensibility of the cloud comes in. So far, little big data/AI is done in industry because the environments are not adapted. Dineen showed Delfi slideware combining just about every imaginable data type.
From a Petrel endpoint, access is possible to various processing services running in the background. Data relationships leverage the ‘well known entity’ concept to blend corporate data with other (IHS) datasets. A WKE widget pulls up well and related information. ‘This is not rocket science and should have been automated 20 years ago’. Delfi is currently at the stage of agile development of minimum viable products. These are already in proof of concept trials with several oils.
In the Q&A, Dineen was asked if SeisDB, ProSource and so on run inside Delfi. They do not, but they can be used to ingest data. ‘In a few years such services will likely be embedded in Delfi’. Dineen also clarified that ‘polycloud’ does not mean cloud agnostic. ‘We currently consider Google as best for analytics and Amazon for data storage’.
Simon Francis (Arundo Analytics) reported that some data scientists have left the industry because ‘you haven’t got anything for me to do’. But as compute and storage costs fall, doing analytics on hundreds of thousands of pumps worldwide is getting feasible. Deep learning is taking over from Excel. Rather than starting out with a data lake, Francis recommends starting small and focused. Many data science projects end at the PowerPoint stage as they prove hard to scale to production. One example is temperature and vibration data analysis of rotating equipment.
Current AI systems tend to flood operators with zillions of reports that ‘something is breaking’. Static alarms do not capture failures adequately. It is better to roll-in multiple data sources P&ID, sensor, maintenance – and to do this quickly! ‘No 18 month projects!’ Arundo deploys a Dell Edge gateway and OPC-UA communications. One non oil and gas use case has been using ML on video footage of a weaving loom to spot when the thread gets to the end of the reel. Similarly a video cam can be trained while pointing at a flowmeter dial! Another use case involved predicting compressor failure. This was successful in that it predicted a breakdown three weeks ahead of time, but, unfortunately, ‘no one was looking at the screen!’
David Holmes began by tempering the general enthusiasm for AI as a panacea for data managers’ problems. Currently, as reported in Forbes, ‘80% of data science has little to do with data science’. This recalls Shell’s year 2000 finding that geoscientists spent ‘75% of their time wrestling with data’. In today’s hackathons, getting up and running with datasets is the hardest part. For example, at the Dell/Agile AAPG hackathon, it took participants a couple of hours futzing with data in Excel before AI could be tried. Even when you have dealt with such issues, there are multiple, more or less incompatible AI/ML frameworks.
All of which has not stopped the growing underground community inside corporates using home brew Linux workstations with GPUs. This is great, but begs the question as to how such work can evolve into a scalable enterprise system. For Dell EMC the answer is the cloud, possibly Dell’s own Virtustream Enterprise cloud where Jupyter notebooks can be managed, models tracked and controlled. IT also has a role providing reviews from formally-trained computer scientists. But combining the whole open source based toolset is not something that adds value to the business. Enter the recently-launched Dell EMC AI platform a bundle of Domino Data, Cognizant, H20AI and more.
The new/citizen data scientist represents an important new community in our industry. These folks can solve problems in a way that would be previously unimaginable. In any case, ‘folks no longer want to sit in front of Petrel all day!’ Better examine and test data in new and interesting ways. Holmes called out Enthought Earth Science Analytics and (again) the Agile hackathons (seven this year) for which there is huge demand. Senior execs show up with laptop and hack Python code. Dell EMC’s Ready Solutions for Big Data add data provenance and algorithm management and governance to the underground movement.
Grahame Blakey and Nick Fosbery’s Loxodrome startup is to offer ‘insight through integration’, adding open source technology to corporate clients’ stacks. Loxodrome is an Esri partner and also uses Logi Info, Neo4J, Apache Solr and lots more stuff. The idea is to bring the combined power of geospatial and IM and the business. This could for instance, address a new ventures problem when a farm-in opportunity arises that requires accessing data across Petrel, IHS and more. Loxodrome has used the community/developer edition of Neo4J and its Cypherquery language. This has led to an easy to deploy rich, flexible data model of seismic surveys, which is claimed to be better than an Oracle RDBMS. Loxodrome is now building an E&P data portal with Geocortex and Logi Info, running atop Solr, SAFE FME and Esri. For more on Neo4J see our report from the 2017 Graph Connect event.
Khalid Abbas, of Lean methodology consultants Kzetta.com sees blockchain as the panacea for, well just about everything to do with the digital transformation. Blockchain is to ‘transform the information asset’ with decentralized, anonymous, time stamped data and information. Use cases are seismic reports, cyber intrusion detection and more. One blockchain flagship is the Ponton Enerchain*, a putative EU energy trading platform trialing with Total and (many) others. Abbas ventured that seismic data, including prestack could be ‘put on blockchain’. Questioned on the energy sustainability of blockchain, Abbas opined that he was ‘against bitcoin’ and that there was no answer to this question. In oil and gas there will be a private blockchain, ‘not Ethereum’.
* We tried contacting the Ponton site through the online form and received a warning about an insecure, badly configured website and then a 404.
Jane McConnell observed that current initiatives (like Omnia) are moving ‘back to the future’ with one big database blending subsurface and topside data from engineering documents. This mandates joining-up different parts of the business. But here, current data management organization does not help as folks still work in different silos, perhaps each with its own IT. Buy (instead of ) build caused the problem and you ‘can’t put a sticky plaster over the top!’
So, back to building your own stuff. As an example, McConnell cites the use of open source parsers such as the DLIS parser from the Agile-maintained SubSurf Wiki. McConnell also opined that doing data quality, fixing CRS/UoM issues up front is a good idea! And, as we have heard before (from Teradata in fact) you (still) do need a data model, or, at least, some agreement on terminology. It can be hard for subsurface and topside people to talk to each other. What is an ‘asset’ anyway! Operators need to plan for turnover and change – and ‘don’t outsource’.
Lapi Dixit, Halliburton’s Chief Digital Officer is a believer in Industry 4.0 and cyber physical systems. Automation and AI will form the bedrock of open interoperable systems. Dixit reprised Halliburton’s OpenEarth initiative as a ‘new business model for co-innovation’. One example is in drilling automation where competing metrics and objectives can be reconciled with predictive models. The trend is for more sensors, edge computing, machine learning and an E&P digital twin running on a cloud agnostic open data platform that is ‘not limited to single vendor.’
Steve Freeman, Schlumberger director of AI believes that the disruptive technology will ‘underpin everything in this industry’. ‘You all need to be inside this space’. The traditional view of the upstream involves taking lots of data, doing complex stuff and giving oneself a pat on back! So, no way this is amenable to AI. Wrong! AI applies closely here. Enablers (all are free) are classification, deep learning, but still, with the human in the loop. Seismic interpretation currently takes too long. So, train a convolutional neural net to pick faults. The data specialist does the bulk of the work and hands over to the interpreter for the final detailed pick. Top salt pick time in the Gulf of Mexico is down from 4 weeks to 4 hours for 95% of the area of interest.
Petrophysics is also an obvious target except that data is ‘over constrained’. Freeman cited Woodside’s 2018 SPWLA presentation that sped log interpretation ‘from weeks to hours’ with 90% accuracy. Better still the machine ‘knew’ when it was not confident of its work. First pass interpretation is handed over to a petrotech for final QC.
NLP on reports is another promising area, Freeman cited the recent work on the UK O&G Authority’s scanned data set, now searchable from Petrel and capable of showing a heat map of lost circulation as detected from well reports. Again, the Apigee API got a plug. Schlumberger is reporting ‘proven value from AI with a 3X speedup on simulation times and 90% off petrophysical interpretation time. ‘If you don’t believe, think about a new occupation.’
Eirik Larsen presented Earth Science Analytics machine learning offering for geoscientists. Large industry datasets (such as the NPD’s) are underused because expert interpretation is costly and slow. Machine learning can help with interpretation and in performing inversion and rock physics in one step. ESA’s GUI for user friendly ML software offers an ‘Alexa’ style assistant capable of answering questions like ‘show me porosity distribution of wells in quad A.’
The next ECIM will be held from 16-18 Sept 2019.© Oil IT Journal - all rights reserved.