Inaugural SMi Big Data and Analytics in E&P, London

New conference addresses the big questions of big data. For some it is a buzzword-laden rehash of what we already know, for others, a powerful addition to the business intelligence canon. Is ‘data scientist’ really the ’sexiest job of the 21st century?’ Is domain knowledge still required?

It would be great to be able to report that SMi’s Big Data & Analytics for E&P conference held earlier this year in London offered in-depth presentations of the application of big data/analytics (BDA) in oil and gas but this was not really the case. The conference included some familiar faces and presentations that were somewhat shoehorned into the big data theme. This makes us ask whether a) big data is already part of the upstream’s way of doing business or b) the big data movement has little to offer the sector or c) the technology is so promising that nobody wants to talk about it in a public forum.

John Greenhough (University Of Edinburgh) set the scene with a presentation on the potential of big data in E&P. Big data is classified as structured (i.e. in a database), unstructured (documents), real-time (sensor/log data) and ‘open,’ i.e. freely available data from sites such as the US’ data.gov and Statoil’s publicly released data from the North Sea Gullfaks field. Making sense of these large data sets has been enabled by developments in multi-core and cloud computing and new technologies such as Hadoop (a framework for distributed processing of large data sets on clusters), Cassandra (a key-value data store à la BigTable) and MongoDB (another ‘NoSQL’ table database).

For Greenhough, E&P data opportunities abound and span the field lifecycle from exploration through asset integrity. One concrete example of ‘big data’ is the shift from daily or monthly production reporting to high frequency real time data from smart fields. The university has been using such data sources to improve recovery from oil and gas fields and now holds several patents for its oilfield analytics. These provide ‘data-driven insights’ into reservoir connectivity, water-flood management and production forecasting. One application, the statistical reservoir model, uses Bayesian statistics to forecast production. Another uses ‘pairwise correlations’ of production between wells to estimate connectivity and map pressure changes in the reservoir. Following field trials the university is now planning a spinout, ‘Recovery Analytics’ to deliver software and analytics solutions.

John Edwards (Aston Business School) doubts that BDA is very new. Business intelligence was first used in the early 1970s. The term data mining was first used in 1983 and the data warehouse came in in the 1990s. All of the above are really just the current buzzwords for management science. However, the big data movement is new in that it brings less structured data into the analytics orbit. This includes messy stuff like social media, text, speech, images and sensor data. All of which may be more or less amenable to physics based modeling, statistics, machine learning and data mining.

Published work on big data in oil and gas is limited to the downstream where it has been used in smart metering. Video analytics has been applied to flare monitoring and oil spill/pollution monitoring. Clementine (now IBM SPSS Modeler) has been used to help control refinery waste products. Analytics has used in E&P for decades. Edwards offers the following advice to would-be BDA practitioners whose focus should be on ‘What questions would you like answered better that you can’t answer well enough now?’ Next you need to decide whether your own domain specialists can hack BDA or if you need external data scientists. The jury is out on this. On the one hand, it had been found that ‘Data scientists need domain knowledge’ and on the other hand, ‘Often someone coming from outside an industry can spot a better way to use big data than an insider, because so many new, unexpected sources of data are available.’

Nick Dyer (Front End Data) and David Bond (Panoramic Data) offered some lessons learned using real time analytics to track the action in the UK’s Premiership football (soccer). First with some pragmatic but insightful definitions viz. ‘Real-time’ translates as ‘faster than we’re comfortable with,’ ‘big data’ as ‘bigger than we’re comfortable with.’ So real-time BDA is about ‘making informed decisions quickly with vast amounts of input data.’ Traditional big data solutions store everything forever and are oriented to batch processing, analysis is done days or weeks after the event. Real-time systems can’t fail, must have the required performance and embed pre-defined queries. In Premiership football, real-time data provides tactical analytics for team managers, performance analytics for journalists and trend analysis for research. This is enabled by video tracking of players on the field and ad-hoc analytics providing heat maps of where and how the action is progressing. A sophisticated IT architecture blends metadata on player, teams with 3D models of stadia. 24 HD cameras record some 27 terabytes of data per match which is analyzed by the real time object tracking system to provide 4.7 million data records. Data is analyzed by ‘eventers’ which can be human or machine. A combination of a SQL database and Olap cube are used along with physics and math-based algorithms to generate KPIs for player speed, event successes (passes, shots) and so on. The IT involves various loops with the auto eventers working fast and the slower human trackers providing validation and data checking. One application that runs off the system is Venatrack, used by managers to review fitness, passing and shooting success and make substitutions. Others use the longer term data generated by such systems to evaluate players for transfer and team building.

Another telling use case is big data use in mobile telephony. Here operators are engaged in a constant battle with bandwidth and leverage strategies such as data compression and data parsimony (don’t use 8 bytes when one will do). In both systems, data significance determines the information path. A fast track discards and categorizes data at source while a parallel, slow data pipeline keeps everything for later ‘traditional’ analytics. Timeliness vs. completeness is another consideration as when delayed data changes information. Here, tactical decisions will use what’s available while strategic decisions will be made when the full picture is clear. Bond also says that you should ‘take data modeling seriously,’ and use multiple storage forms (memory, file, SQL, OLAP, custom) and technologies (memory, SSD, HDD & SAN). While this presentation was off topic it was very a propos for potential deployers of similar technology in a digital oilfield context.

The inimitable Jess Kozman (Mubadala Petroleum) asked ‘should E&P companies have a chief data scientist?’ The results of a straw poll held during a recent data management conference revealed that the title of ‘data manager’ was perceived as conveying experience and knowledge (no surprises there considering the audience) but that data ‘engineer’ came out ahead, with ‘data scientist’ a close third. According to the Harvard Business review, data scientist is the ‘sexiest job of the 21st century!’ What distinguishes data ‘science’ is the ability to go beyond the reporting function and provide ‘predictive insights and innovations’ from the data.

According to Kozman, a single oilfield in Qatar generates three times the data than that used in finding the Higgs boson (around 30 petabytes). With a little shoehorning we can recast the whole of seismic imaging (very big data there) into the new paradigm. Digital rocks likewise provide very big data volumes from very small samples. Pretty well all oilfield data now ticks one or more of the big data boxes of ‘variety, velocity and volume,’ especially the newly popular fiber optic distributed sensors which generate ‘tens of terabytes per day.’ Much current data loading and preparation is so slow that much time and money is lost while actionable information is extracted.

Compounding this problem is the fact that oil and gas data is unique in its scope and variety. Typical workflows blend data from a large number of sources. Kozman sees potential work for the data scientist in leveraging Apache PIG data pipelines, using the Hadoop/HIVE combination to ‘tame’ sensor data and Apache Storm to process ‘billions of events.’

Matthew Harrison gave a wide ranging presentation on big data at the British Geological Survey (BGS) which actively manages a growing 250 terabyte dataset along with several hundred kilometres of core data and 17 shelf kilometers of paper records, maps and reports. BGS is a contributor to the open data movement notably via OpenGeoscience, where users can view and download geology data as web map services, access over a million borehole scans and search and download photos from the GeoScenic geological photo archive. BGS also offers more specialist data such as temperature and strain gauges for monitoring earth movements and geophones for earthquakes. BGS manages its data leveraging standards for geographic data discovery and is a big time data modeller. The OpenGeoscience portal is underpinned by the BGS Rock Classification Scheme, a ‘practical, logical and robust system’ for classifying and naming geological materials as they appear at the scale of an exposure, hand specimen, or thin section. The BGS standards map shows around 20 different standards of varying scope with the venerable Codata as a global scientific standard. Harrison wound up observing that geoscience data is truly ‘big’ and growing rapidly especially at the complex end of the spectrum. Ways need to be found to capture the data and geological knowledge held in thousands of scanned images programmatically. Perhaps linked data approaches and ontologies are the way forward—although their creation can be problematic. He also sees potential in using social media and system logs to derive knowledge and business intelligence.

Another retrofit came from Jill Lewis (Troika) who argued plausibly that for seismic, big data is a done deal and that what is required is a continued focus on standards and data quality. On data acquisition, ‘Do it once and do it right,’ manage data pro-actively, use the right formats, populate mandatory fields and get ready for BDA.

Aapo Markkanen (ABI Research) described the meeting of big data and the internet of things (IoT) as the ‘great crossover.’ For ABI, the leading use cases include predictive maintenance, operational analysis and contextual awareness.

Dan Cornford (Aston University and Integrated Geochemical Interpretation) believed that quality, trust and quantified uncertainty are key to data-driven decisions. Big data sets can’t be used without standards compliance and copious metadata on quality, provenance and preferably user feedback, expert review and citation information. All this is rolled up into the concept of ‘meta-quality’ i.e. metadata of quality information. This can be used to validate judgements made on third party data. Unfortunately, it is very rarely provided, ‘even statisticians often don’t validate things properly!’ Cornford used the global earth observation system of satellites (Geoss) as an example big data set. Less than 20% of Geoss records contain quality information and only 1.3% track provenance completely. One issue is that the ISO quality standards (ISO-19157) are too complex for real use. Also end user tools can’t in general use quality information. Cornford asked if big data was a challenge in the upstream. While some data, like seismic, is ‘big,’ ‘you don’t want to mine this, you want to manage it.’ Cornford does not believe that today’s generic BDA has much to offer. Its focus is largely on business intelligence and data mining. In the upstream, the focus is models, insight and judgement. Here, the key to decision making is access to the right data, quality information and the right tools. ‘Make sure users can then find and access their data easily from their applications whether it is ‘big’ or not, otherwise, don’t expect good decisions!’

Duncan Shaw, (Nottingham University Business School) brought some BDA experience from the commercial world to the debate. Shaw has advised companies like Ikea and Marks & Spencer on BDA. In general, corporate data is under-utilized and companies are confronted with ‘confusing new toys and techniques.’ But the potential is definitely there as ‘All sectors are just scratching the surface of what can be done.’

Summing up, we suggest that the Scottish courts’ verdict of ‘not proven’ could be the judgment of SMi’s inaugural big data in E&P conference. The killer use case or start-up has yet to manifest itself. But that will not stop folks (including Oil IT Journal) from tracking the trend and unpicking the buzzwords. In fact you will be reading a lot more about big data in E&P in next month’s Oil IT Journal when we report back from sunny Haugesund, Norway and the ECIM E&P data management conference.

Inaugural SMi Big Data and Analytics in E&P, London

New conference addresses the big questions of big data. For some it is a buzzword-laden rehash of what we already know, for others, a powerful addition to the business intelligence canon. Is ‘data scientist’ really the ’sexiest job of the 21st century?’ Is domain knowledge still required?

Click here to comment on this article

Click here to view this article in context on a desktop

Inaugural SMi Big Data and Analytics in E&P, London

New conference addresses the big questions of big data. For some it is a buzzword-laden rehash of what we already know, for others, a powerful addition to the business intelligence canon. Is ‘data scientist’ really the ’sexiest job of the 21st century?’ Is domain knowledge still required?

Sign up for occasional emails and subscription information...

Click here to comment on this article

Click here to view this article in context on a desktop