A data management 101

Oil IT Journal editor Neil McNaughton, back from the 2008 PNEC Data Integration Conference, decided that world needs a data management backgrounder. He provides some data definitions—of sample, trace, meta and master and notes the impact of the data warehouse community on the upstream and the duality of master data management and data quality. Both hot topics at PNEC.

One speaker at PNEC intimated that those who define meta data as ‘data about data’ merited a ‘punch on the nose.’ Since I have always considered this to be a rather elegant definition, and since the speaker failed to offer an alternative (to the definition, not the punishment), I thought that I would devote this editorial to a review of data, data management and the state of the industry.


It struck me that this might be a good thing because I noted a degree of obfuscation in some presentations—and I will try to explain why this has come about too. Our credentials for this I believe are reasonably good. Oil IT Journal started life back in 1996 as Petroleum Data Manager and we have covered very many data conferences on both sides of the pond since then. So here we go with Data Management 101. I would politely ask those of a pugilistic disposition to read to the end before donning their gloves.


Much oil country data is a record of something happening against time. The amplitude of a seismic recording or the volume of oil produced. Other records are made as a function of depth like a well log. All of these can be plotted out on a graph—or traced. Hence they are called ‘trace’ or ‘raw’ data. Digitizing such data involves sampling the continuous trace at regular intervals (of time or depth). Hence a trace is made up of data samples.

Meta Data

Traces and samples are recorded in a wide range of more or less standard formats like Log Ascii Standard (LAS) for wells and SEG-Y for seismic. These share a common approach to how the data is recorded, just as digital camera images are stored as jpegs or whatever. The record begins with a header that contains both master data and meta data. Wait a second for the master bit. Meta data is data about data (so punch me!). Meta data may be the sample rate—the number of feet, meters or milliseconds between samples or the scale factor of the individual samples.

Master Data

Like I said, also contained in the header record is ‘master data.’ This is not data about data, but rather data that identifies the record as a whole. Master data could be a well name, a seismic shot number or a survey name. It is easy to see how a hierarchy of master data can be built up. From survey to line to shot to sample. Or from well to wellbore to log to sample. There is no reason that some data elements in the middle couldn’t be considered as both master and meta depending on your viewpoint. This is not an exact mathematical science. What is important about master data is that it has ‘currency’ outside of its immediate data object. Master data is what ties disparate data stores together. So the well name in a production test can be tied to the same well in the log database.

A caveat

It is worth observing that folks recorded ‘header’ records before they were called meta or master data. And also that they have been calling the whole header ‘meta data’ before the term ‘master data’ was even invented. The master data terminology is a retrofit to digital recording that goes back to the 1970s. I think it is a very good retrofit, but it does cause some to want to punch people because its requires some adjustments to prior usage. But this is the nature of progress.

Data mining

Part of the trouble comes from the fact that the whole ‘master data’ concept came, not from E&P but from the data warehouse community. Banks and large retailers have transaction systems that are unsuited to analysis and so build separate ‘warehouses’ that replicate information in the transactional systems. This has introduced concepts (data mart and master data) that upset the terminological applecart.


They have also brought new service providers who are touting their wares as a panacea to the upstream’s woes. At one level they are very welcome because E&P has grown up with a large number of data stores for well data, seismics, production and so on.


As I said above, master data is what ties different data stores together. If you have different well names in different systems or if individual systems contain different names for the same well then you are in trouble. You have a data quality problem. I state this obvious fact because master data management and data quality are actually two sides of the same coin. This was made clear at PNEC when the master data managers started jumping up and down during the data quality presentations and vice versa.


But the big question is, does the upstream, which has a considerable history of analyzing and managing its specialist data types, have much to learn, beyond some useful terminology, from the master data community? One answer to this is that it may have no choice. Your finance department may be driving the MDM project and need access to some well or production master data.


As for MDM in a more geotechnical context, you may be interested in our piece on page 9 of this issue, a synopsis of a significant PNEC presentation by BP and SAIC on the deployment of Kalido’s master data management solution to tie together Finder and BP’s drilling, completion and production databases.


I mentioned ‘obfuscation’ earlier on. To a consultant from outside of the business trying to peddle a horizontal solution into a technical vertical, it may seem neat to make things seem harder than they really are. In E&P, obfuscating the simple stuff (above) is a really bad idea. If you don’t believe me, check out some of the hard stuff like RP66, SEG-D and WITSML.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.