The DAMA Dictionary, master/metadata and the meaning of it all...

Before his vacation, editor Neil McNaughton returns to the meta/master data controversy of a previous editorial armed with the DAMA Data Dictionary, only to discover not so subtle differences between technical and financial services data management. and some ‘framework’ déjà vu!

In my editorial of April 2008 I offered a Data Management 101. Not exactly a ‘body of knowledge’ but, I believe a good start for those not blessed with the hardened nose that comes from years of managing upstream data. This sparked off a flurry of correspondence (well two letters actually) and a slightly contrarian view on some of the terms used. David Lecore’s thoughtful contribution made me sit up and think. But the nature of the editorialist’s life is such that I was not ready to respond to his well made points. I was waiting for inspiration and clarification.

Last month we reviewed the DAMA Data Management Body of Knowledge (DM-BOK) and now (page 9), the companion DAMA Data Dictionary. I hoped that these authoritative oeuvres would put to rest the meta/master data controversy. Well of course they didn’t. Input from DAMA, which I suppose is largely the financial services community, rather than providing insights, has clouded the issue. It is hard enough trying to analyze one’s own data into a logical structure. It is even harder trying to shoehorn it into a ‘one size fits all’ analysis.

Lecore pointed out that my data/meta data/master data breakdown did not recognize that category of data that is placed in a ‘master data set,’ a cleaned-up version of say, well headers or perhaps a definitive list of formation tops rather than the plethoric provisional, idiosyncratic or just plain wrong versions that may exist in project data stores. What does DAMA say here?

The situation, as I understand it, which exists in the financial services community, is that the ‘granularity’ of the ‘authoritative’ master data is pretty much the same as the granularity of the subordinate data sets. Cleaned-up ‘master’ data touches each record. Matching Fred Bloggs with F. Bloggs and Frederick E. Bloggs as they appear in his bank’s database, the Social Security and wherever else.

In E&P, the situation is different. Managing voluminous complex data types involves compromise and ‘creaming off.’ You are not going to build a ‘master’ data set including every seismic sample. A project will contain such, but these will likely be in a file, not a data base. As you go up the data hierarchy to a ‘master’ set, believe me, if this contains a definitive list of line names and endpoints you will have done very well indeed, better than most majors I believe. E&P data management is about compromise and the art of the possible.

One of the funny things (to a technical data manager) that one comes across when rubbing shoulders with the data managers from financial services is the use of ‘data marts,’ ‘data warehousing’ and the like. What is happening here is that if you are processing a very large number of transactions (folks using their credit cards in a ATMs all around the country), then the last thing you want is to have some smart ass ‘analyst’ hit the transactional database with a show stopping ‘query from hell’ (that one did come from the DAMA dictionary!) To avoid such disasters, data from the transactional system is replicated into another system where queries can be performed without breaking anything. This arrangement is has spawned a lot of technology and terminology that is occasionally seen in the upstream—as witnessed by our our story from Composite Software and Netezza on page 3 of this issue.

In the earliest days of data management it was noted that the same terms—notably a well identifier—cropped up, often in slightly different versions in different databases around the enterprise. Early workers—especially the smaller outfits like iStore, Petris and OpenSpirit, who were obliged to live with the major vendors’ databases, soon learned how to map between different formats and provide more or less seamless access to data in different data stores. Of course ‘seamless’ does not necessarily mean all your worries are over. There are issues of data quality and cleansing which I’ll deal with in a future article. But what is key is that sooner or later, as you cross the domains and silos you are into data reconciliation and ‘mapping.’

Now this is hard work, it’s not something that is easily shrink wrapped or ‘sold’ to management. Which is probably the reason why the very successful, proven technology of mapping across different vendor stores has gotten relatively little attention—and why this wheel is constantly being re-invented. Today, there is a wide choice of solution providers and offerings from the major outsourcing/consulting houses who are capable of performing the mapping and building a more or less bespoke ‘framework’ for the upstream.

So what I hear you say? Well the ‘what’ in this context is that this, the data mapping ‘problem,’ is what IBM’s Integrated Information Framework and Microsoft’s Upstream Reference Architecture pretend to address. While the former’s history has more visibility (funded by Statoil and targeting operations and maintenance), MURA’s background in the upstream is less clear—that is until you see the white papers from Wipro1 and Pointcross2. Hidden behind the marketing jargon and kowtowing to Microsoft’s architectural ‘promised land’ are enterprise-scale data mapping projects preformed for clients.

In conclusion I have to admit that the cause of straightforward definitions has not been advanced very much. There is a parallel here between data management technology and terminology. The biggest problem in managing data is handling legacy data. And the biggest problem with trying to pin down the terminology is the ‘legacy’ of overlapping techniques and usage. On the other hand, this is what makes things so interesting. As I trust this issue of Oil IT Journal shows—with reports on Petrel data management from Apache, on data ‘virtualization’ from Composite Software, interviews with thought leaders from OpenSpirit and Roxar and developments on the ‘semantic’ front from the POSC/Caesar Association and our review of a rather good book on the subject.



This article originally appeared in Oil IT Journal 2010 Issue # 7.

For more information or to comment on this topic email here.