Data modeling special issue

Editor Neil McNaughton comments on similarities and differences in data modeling approaches from BP, Shell and others as observed at IRM-UK’s data conference and SPAR Europe/PlantTech, and from his reading of Mathew West’s new book ‘Developing high quality data models.’

Sometimes I have to pinch myself to see if I am not dreaming, but attending a dozen or more conferences in a year really does give one some insights—not to mention some pretty interesting cross-checking possibilities. This month’s insights began at the excellent IRM-UK Data Management1 conference held last November, but that is not where I am going to start my account.

Matthew West’s new book ‘Developing High Quality Data Models2’ pinpoints an issue that every data modeler has come across—the way that objects evolve over time. This can lead to modeling ‘gotchas’ that make for constant tweaks to a data model. West’s answer is ‘4 dimensional’ modeling using an ‘ontology’ that seeks out the meaning of things at a deep, near-philosophical level. More on page 3.

West’s examples come from the downstream—but one can dream up some more from other subject areas. Consider an exploration permit. We will not delve too deeply into the fact that one person’s ‘permit’ is another person’s ‘license’ or indeed ‘licence,’ although such subtleties are often stumbling blocks. Let us suppose that the permit has a certain shape described by a set of points or vertices of a polygon. Again we will move on quickly even though most will spot another boat-load of pitfalls here as a ‘polygon’ has no meaning without mucho metadata as to projection system and datum and what scheme (great circle etc.) is implied in ‘joining up the dots,’ something that some jurisdictions do not even bother to define. Leaving that aside for the moment, to show how hard things get, let’s keep it simple.

This permit has been awarded to a joint venture comprising three partners. In a couple of years time, a farmout results in another company coming in and financing a well for the joint venture. The well is successful, a field is discovered. This is the sort of thing that catches the eye of larger companies with lots of cash and a portfolio to manage—so a new predator comes in and acquires its share of the field from one of the original joint venture partners—who being of a crafty nature, sells all its data in a separate deal. Some of this may include speculative, non exclusive data whose ownership is the subject of debate—again, mileage will vary in different jurisdictions.

Meanwhile the original exploration license may or may not, (again depending on jurisdictions), change shape—as it is renewed or recast as a development license. It changes its ‘spatio-temporal’ extent as West would say. Meanwhile too, more money is spent, by the original joint venture partners, by the farminer and by the acquirer. Meanwhile too (I forgot to tell you this), the discovery was made in a country that was a member of a currency area whose economies, after pulling this way and that as some members worked hard and retired late in life while others spent all their time in the café, has created a new currency, devaluing in the process. The company that acquired a share of the field is co-located in countries that are both inside and outside of the country whose currency has changed its ‘spatio-temporal extent.’

Now the boss comes in and asks you questions like ‘what is the book value of our stake in the XYZ field?’ or ‘how much is our data resource worth now?’ Now if you are like I am, a kind of entry level SQL practitioner, you are probably thinking, ‘Yikes! I have just had to change my data model 15 times and I’m still not sure if it is any good.’

If you are Matthew West you have probably figured out what the ontological sense of ‘currency’ is, how the interactions of shareholdings and license extents coalesce and diverge, and you are ready for anything. If you work for SAP or Kalido you probably have an answer too.

Shell’s data modeling approach, as described in this month’s lead, appears to eschew the ontological approach—instead striving for taxonomic homogeneity across the enterprise—and reserving domain specifics for ‘deep dives.’ I’m not sure why, but this makes me think of the common entreaty not to ‘hold your breath’ while diving is in progress...

My take is that it is unrealistic to expect any data model to cater for all eventualities. Even fundamental questions—like ‘what is the book value’ get to be hard, if not impossible to answer, without giving much more context. This is not an academic notion. The book value of an asset is a challenging topic that itself has ‘spatio-temporal extent’ as different jurisdictions debate ‘fair value’ accounting and the like.

Even production data is likely to originate as a hotchpotch of more or less accurate information gleaned from meter readers with their minds on something else, from a poorly calibrated multi-phase meter or a ‘fiscal’ meter with a stuck needle.

The underlying ‘meaning’ may be rather hard to get at which is probably why the top down approach appears so attractive. Complex models that cater for local niceties home-in on a local version of the truth—but in a way that may not be amenable to upscaling to the enterprise level. But starting at the top and working down to an understanding of everything is indeed a herculean task. At the end of the day, management, and the taxman, wants you to ‘just give me the number.’

Such issues are echoed in another IRM-UK presentation. Frances Dean described BP’s earlier attempts to ‘boil the ocean’ with mega data modeling projects. BP’s Trade Completion back office data modeling appears to be more of an attempt to align computing systems with the business. More bottom up than top down perhaps, but no mention here of ontologies or grand abstractions—although they may be under the hood!

I almost forgot, but there is even more on data standards in our report from the 2010 Spar Europe/PlantTech event we attended last month on page 8.

Well, I hope that this editorial has whetted your appetite for more from this data-modeling packed issue. Happy reading.


2 Morgan Kaufmann, ISBN 9780123751065, 2011—

This article originally appeared in Oil IT Journal 2011 Issue # 1.

For more information or to comment on this topic email here.