From Cretaceous to namespace—really useful RDF?

Oil IT Journal editor Neil McNaughton hears tales of database ‘blues.’ The answer could be ‘master data management,’ ‘taxonomies,’ ‘ontologies’ or maybe ‘all of the above.’

At the end of an entertaining talk on geological ‘database blues’ (see our report from ECIM 2007 on pages 6-7), ExxonMobil’s Stephan Setterdahl polled the audience to see what datatypes gave them the database ‘blues.’ Perforation data, faults, seismics and well bores all got a mention, but what came out ahead was ‘naming conventions’ in general. There was a consensus that in the real world, especially when exchanging data with partners, it is necessary to work with multiple naming conventions. And that a database that ‘understands synonyms’ would be a big help.

Data uniqueness

This issue crops up in different contexts and granularities and across pretty well all industries. In one sense, the issue is one of data uniqueness and cleanliness. The issue of calling the top Cretaceous pick ‘Cret,’ ‘Cret.,’ ‘Kret’ or whatever is the same problem that I share with my bank and other figures of authority. Living in France, the simple question ‘What’s your name?’ is a source of great confusion. ‘McNaughton’ I reply. My interlocutor asks me to spell that. ‘MC...’ invariably this written down as ‘MAC’ and thus I can be ‘Mac Naughton,’ ‘Mcnaughton,’ ‘Mc Naughton’ and so on.

Data cleansing

I mention this more general ‘naming’ problem because it brings us closer to the generic solutions to the problem that the data management community outside of E&P is more familiar with. To a bank, the data issues above can be approached from two directions—either as a data cleansing problem—using scripts to look for Mac Naughtons who say, live at my address and turn them into McNaughtons. It could also be approached from the standpoint of managing synonyms—the McNaughton in our address database is the same person as the Mac Naughton in our ‘people who owe us a lot of money’ database. I’m sorry if my understanding of banking is sketchy, but you get the picture.

Master data

The issue of giving different names to the same thing has been widely studied by the banking and financial services sector. It comes under the broad heading of ‘master data management,’ and is key to cross database reporting, business intelligence and other high level applications. These approaches work essentially by identifying and managing synonyms. Are they applicable to the upstream? I’m not sure—they are likely burdened by a lot of financial services domain knowledge—there should be an easier way...

Taxonomies

Another ECIM presentation gave me some more food for thought. Contesto’s Anne-Lill Holme advocates ‘proper’ document classification, using a modified Dublin Core scheme for metadata. Here we are in the world of controlled vocabularies and taxonomies, all lists of ‘master data.’ Moreover, managing different document libraries following say, a company merger, will present the same ‘synonym’ issues as above.

Lists

I submit that all of the above issues are facets of the same problem and that is how to represent lists of information in a really useful way. Surely this completely generic issue has been addressed by academia, computer science or someone?

Kadme

Enter my next witness, Kadme’s Vasily Borisov with whom I chatted at the ECIM gala dinner. He explained how Kadme had compounded information from multiple public data sources using the W3C’s resource description framework—RDF. The different ‘synonyms’ can then be mapped with RDF-aware tools such as Stanford University’s Protégé.

Oilfield Ontology

Well this tickled me because as some of you may remember, back in 2004 I sat in on the W3C’s Semantic Web special interest group and editorialized about the technology (OITJ March 2004). Also we reported recently (OITJ June 2007) on the Chevron/Schlumberger-backed Open Oilfield Ontology* effort—another RDF/Protégé effort.

And the answer is ...

RDF, if you will, is academia’s best shot at fixing the ‘how to make a list’ problem. It goes further than simple lists—in fact the semantic web has a lot to offer ‘master data management.’ It is all about understanding and sharing information across disparate resources whether they are documents, web pages or data stores. It works by a) assigning a unique identifier to the list (the namespace) and b) by storing the list items in a rather turgid, but machine readable form. If you want to know how it’s done check out for instance the W3C’s ‘vcard**’ namespace below.

* www.o4oil.org

** www.w3.org/2001/vcard-rdf/3.0#

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.