The Hadoop historian

Pointcross completes Phase I of drilling data server and repository proof of concept for Chevron. Open source Hadoop/HBase stack provides flexible access to multi-terabyte drilling data.

Pointcross completed its drilling data server and repository (DDSR) proof of concept for Chevron earlier this year. The PoC demonstrated the use of a Hadoop distributed file system and HBase ‘big table’ data repository for storing and analyzing a multi-terabyte test data set from Chevron’s drillers.

Currently such data is spread across multiple data stores and search and retrieval is problematical. There is a ‘knowledge gap’ between data scientists and domain specialists. The formers’ algorithms may be running on a limited data subset, while the latter tend to develop spreadsheet-based point solutions in isolation.

A big data solution however is not magic. Pointcross’ specialists have developed a classification schema for drilling documents, a drilling taxonomy and an Energistics-derived standard for units of measure. This has enabled documents and data sources such as LAS well logs, spreadsheets, text files and Access databases to be harmonized and captured to a single repository.

One facet of the PoC was the ability to scan volumes of well log curve data to detect ‘patterns of interest,’ an artificial intelligence-type approach to the identification of log signatures. These can be used by drillers to pinpoint issues such as mud losses or stuck pipe. The technique can automate the identification of formation tops.

The sample data set comprised some 6,000 wells with over three billion records and half a million ‘other’ documents. All in all around nine terabytes of data were loaded to the Hadoop cluster, set up in Chevron’s test facility in San Ramon.

A key component of the Pointcross IT stack is a ‘semantic data exchanger,’ that maps and connects disparate data sources to the DDSR. A significant effort was put into a mnemonic harmonization program. This was to compensate for the plethoric terminology and abbreviations that plague upstream data sets causing ‘misperceptions and increased complexity’ for data scientists.

Pointcross is now pitching a second phase of the PoC with enhanced functionality, more data and use cases. These include more taxonomic harmonization, data curation and event extraction from PDF documents, daily reports, trip sheets and more. Phase two will investigate how other causes of nonproductive time can be attributed to lithology, bottom hole assemblies, and crew schedules.
Pointcross is also extending its big data offering under the Omnia brand, into a ‘total business solution for exploration and production.’ Omnia includes solution sets for geophysical, shale, production asset management, big data and enterprise search. More from Pointcross.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.