What is a well—or for that matter, a turnip?

Oil IT Journal Editor Neil McNaughton weaves some threads from PPDM’s What is a Well initiative, a new book on RDF and BP’s data modeling into a ‘semantic web’ all of his own.

I was surprised when I read, in Steve Ballmer’s address to the Microsoft Global Energy Forum (more on page 7), his explanation of Microsoft’s latest marketing concept of ‘three screens and the cloud.’ Previously I understood the ‘three screens’ in Microsoft’s new paradigm for world rule to be a cell phone, a laptop and a workstation. Not at all! Screen three is a TV! While the TV as a ‘third screen’ makes a lot of sense for Microsoft, which has been working for years to get a bigger slice of the media gateau, the TV in the enterprise is harder to swallow. Maybe the ‘one size fits all’ approach to marketing has its limits!

~

One size doesn’t fit all for upstream data modeling either, as our report in this issue from the PPDM ‘What is a Well’ WIAW workgroup underlines. As PPDM CEO Trudy Curtis observed, ‘As an industry, we don’t agree on these fundamental objects.’

Of course, not agreeing on terminology is not limited to oil and gas. When I moved from London to Aberdeen, I was surprised to find that what I called a ‘turnip’ was called a ‘neep’ in Scotland. But that’s not all. A turnip can also be a rutabaga, a swede, or vice-versa depending on where you are and who you are talking to (1). The point is, whatever we decide to call a ‘turnip’ we are still likely to come unstuck in some localities and with some usages.

Listening PPDM’s effort to cage the ineffable reminded me of an earlier attempt, POSC’s (now Energistics) Epicentre E&P data model. While I do not claim personal knowledge of Epicentre’s entrails, I understand that its modeling of wells and wellbores was authoritative at the time and may constitute ‘prior art’ of some use to today’s practitioners. Unfortunately, Energistics’ website refresh has all but written Epicentre out of its history along with inward pointing links to seminal facets of the project. This (hopefully temporary) glitch is a shame for such a major modeling effort. Epicentre, by the way, was defined in the Express (3) data modeling language, generally considered to be as rich a tool as it was hard to master! It also embedded a strong E&P modeling body of knowledge going back to the American Petroleum Institutes RP 66, itself a standardization of Schlumberger’s DLIS log data format.

~

Tim Berners-Lee’s (TBL) idea of ‘linked data’ has seen some encouraging take-up from the US and UK governments. The US Department of Energy has launched openEI.org, an open-source, ‘linked data’-based website of energy data. In the UK, the data.gov.uk seeks to provide a ‘programmable entry point’ into government data. I say ‘encouraging’ rather than, for instance, ‘world-shaking,’ because both sites have something of the skunk works about them. If you poke around a bit looking for something specific, you may be in for a run around.

The important thing about both initiatives is that governments appear to be pushing more data into the public domain, revisiting restrictive copyright issues. A subsidiary issue is the actual format in which the data is supplied.

You might like to reflect on the relative merits of different ways of publishing data online. If you are in a government department and have just received a Word document with a table of the current month’s oil and gas production data for your country, you could scan the document and include an image in a PDF file on the website. This is about the worst you could do. Better would be to put the Word document online—but this is far from optimal, as getting at the data in a table inside Word is awkward. Flash (as used on wiaw.org) is probably not the greatest thing either—especially for a standard! There are other options. Comma separated values aren’t bad. An HTML table is OK. But these still require frustrating editorial gymnastics to get at the data.

TBL’s aim for linked data goes beyond just being able to ‘get at the data.’ The key idea is to be able to get at it programmatically. Data providers should provide an application programming interface (API) so that programmers can write programs, grab data across multiple sources, and ‘mash-up’ the data in interesting new ways. Now the question is, which API is right for the web?

Enter the W3C’s Resource Description Framework (RDF). We have written extensively about RDF in previous issues of Oil IT Journal, in reports from ‘Semantic Days’ conferences and the 2008 Chevron-hosted, W3C’s ‘Semantic Technology in Oil & Gas’ event. But I have a confession to make. The promise of machine readability, ‘reasoning’ and ‘open data’ led me, like many others, to ascribe almost magical properties to RDF. In particular, it has been represented as (somehow) holding the key to solving the ‘what is a turnip (or well)’ problem.

A new book (4) from O’Reilly press which we will review in a future issue puts the record on RDF straight. RDF is not about ‘reasoning,’ it is not even really about ‘semantics.’ RDF is just a very minimalist data modeling language—at the opposite end of the modeling spectrum from say, Express. RDF offers no more help in ‘disambiguating’ turnip or well terminology than previous modeling languages.

~

Another relevant new book is ‘Data Modeling for the Business (5),’ to be reviewed in next month’s Oil IT Journal. Of particular interest is a sub-chapter by Mona Pomraning on data modeling in BP. The approach here is light-years away from the hard core modeling of Express or even PPDM. Hoberman and co-authors advocate what is almost a ‘touchy feely’ approach, with stakeholder involvement and multiple (four) levels of graphical modeling. ‘Logical’ models such as PPDM, MIMOSA and PRODML are second to bottom (above the physical model) and there are ‘high level’ and ‘very high level’ ‘C-Suite’ models above these. It seems as though the whole modeling focus has shifted from data to disambiguation.

1 links/1002_6.

2 Unbelievably, a googlewhack!

3 links/1002_7.

4 Programming the Semantic Web. Segaran et al. O’Reilly 2009.

5 Data modeling for the Business, Hoberman et al. Technics Publications—2009.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.