The semantic web—no not again!

Oil IT Journal editor Neil McNaughton notes an uptick of interest in things semantic, with the establishment of the World Wide Web Consortium’s Oil, Gas and Chemicals Business Group. An opportunity to take stock of the developments in oil and gas and offer his 2¢ of semantic wisdom.

As regular readers will know, I am more of a skeptic than a proselytizer. There are after all, enough people writing uncritically of the industry if that is what you want. I have always felt that my role has been to question and try to pick things apart. But I have a confession to make. I have, in my own sweet way, proselytized for some time about the semantic web—in particular as it manifests itself in the oil and gas software space. This proselytizing has come in two forms, a couple of papers presented in conferences and a quite extensive coverage of the technology in Oil IT Journal. This issue of Oil IT Journal has no less than three semantic-ish articles—our report from the EU Fiatech meeting, from Norway’s EQHub and a reprise of last month’s review of the ISO 15926 Primer.

With the slow burn liftoff of the World Wide Web Consortium’s (W3C) Oil, Gas and Chemical Business Group (www.oilit.com/links/1111_30)  I think it is time to revisit the semantic web and give you a very personal take on where it is in oil and gas and in the world at large.

The semantic web was dreamed up over a decade ago by Tim Berners-Lee as an attempt to add structure to data on the web. One simple usage might be to note that in, say, a PowerPoint presentation, mention of a particular scientific paper could link seamlessly through to the conference where it was first given and , why not, from there to the data set that supported the research. The possibilities are, or should have been by now, endless.

To offer folks a universal way of representing data items requires a general purpose data modeling language. The W3C in its wisdom, opted for the Resource Description Framework (RDF). This bare-bones language sees everything as a ‘triple,’ of subject, predicate and object. With all three uniquely nailed down as a web resource or ‘URI.’ More, if you are interested, on Wikipedia (www.oilit.com/links/1111_31).

As you can imagine, the promise of a web of intermeshed, usable data was extremely exciting to many. This led to great expectations, to much hype and almost as much disappointment and criticism.

First, the hype. In the Fiatech Introduction to ISO 15926, the familiar claim for machine to machine interoperability via semantic technology is made. An almost magical quality is evoked that would have us believe that somehow, without changing our systems, they can be rendered interoperable. This is of course pure hype. Systems need either re-writing to conform to a RDF view of the world, or they need to expose a semantic interface. 

In the case of a web page containing snippets of potentially reusable information this is not too hard to imagine. In fact the W3C has produced a flavor of RDF, ‘RDFa’ that inserts triples into a page of HTML. This can be viewed as per normal in a browser, but for those who want more structured information, that is available by parsing the document for its hidden RDF goodies. This might be a way, for instance, of adding metadata of reference to a web page, say a well’s UID.

True believers may however want to go the whole hog and expose all their data in RDF. This leads to a duality of resources—for example, Wikipedia is available as regular HTML or as machine readable RDF on DBpedia (www.oilit.com/links/1111_32). Those in the know can perform queries of data in DBpedia as if it were a database using the RDF query language Sparql.

Turning Wikipedia’s into a database was helped by its relatively simple and consistent structure.  Things gets harder if you are modeling more complex stuff. Because RDF is such a simple construct, modeling even relatively simple objects (like PCA’s now famous pressure transducer) leads to pretty horrendous models. There is a feeling that this is problematical because no two modelers see the world the same way. Simple constructs, like adding units of measure, involve convoluted modeling.

The problem is that RDF does not encourage an object approach to modeling like XML does. The world is flat and made up of triples. Fiatech has got around this by promoting the ‘Facade,’ hiding the complexity of the underlying ISO 15926 model. This is OK in that it enables interoperability. But the Facade is an obstacle to a ‘pure’ semweb approach. Users need to understand and unpick the Facade instead of letting their semantic tools loose on a raw RDF dataset.

The early hype surrounding the semantic web also gave birth to a different kind of ‘modeling’ altogether. After all, ‘semantic’ has to do with language doesn’t it? So we have another community that is fussing over knowledge representation, on meaning and other high-falutin’ concepts. Again, boatloads of hype here, although the simple knowledge organization system, SKOS (www.oilit.com/links/1111_33) looks interesting. 

Finally, another gotcha. Tim Berners-Lee’s take on his own brainchild has evolved since the original semantic web to one of freely available ‘linked data.’ In the greater scheme of things, it is probably more important to decide if you should be sharing your data freely, than worrying about what format it needs to be in. Such considerations are very much a propos in regard to ISO 15926. Owner operators would benefit from freely shared plant information. Other stakeholders may be less enthusiastic. Equipment manufacturers may have IP to protect. Software vendors may amass market share in part through obscure formats. and there are already companies which make a living from sorting all the mess out, with or without RDF. A decision on how much data will be freely available has never really been taken. The issue goes way beyond IT and standards. But things are moving here. The POSC/Caesar Association has just opened up a gateway—or in the jargon, an endpoint to the current ISO 15926 equipment catalog. You can test drive on—www.oilit.com/links/1111_34 if you are a person, or perform semantic queries on the machine readable endpoint on—www.oilit.com/links/1111_35.  

My personal feeling is that oil and gas should be using RDF more—especially where it is easy to deploy. If you are writing a seismic spec or the next version of Witsml then it might be a good idea to include some of your metadata in RDF. It won’t cost anything and folks living in a future semantic world may thank you for it.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.