So that’s the ‘semantic’ web! Now I understand!

Oil IT Journal editor Neil McNaughton takes note of a groundswell of interest in semantics in the upstream—just as Tim Berners-Lee seems to be redefining the nature of the beast!

I am loath to bug you about the semantic web again, but since in the current issue of Oil IT Journal we have no less than five independent mentions of the technology, it looks like we may have a groundswell on our hands if not yet a tsunami. We hear from Norwegian Kadme on the use of semantic web technology in the new ArcticWeb portal. In our report from the Microsoft Global Energy Forum we hear Dan Ranta (ConocoPhillips) describe semantics as ‘very useful’ and Peter Breunig mentions Chevron’s use of the technology. In piece on the upcoming SPE1 Real Time Optimization Technical Interest Group’s Joint Venture Reporting workshop to be held at next month’s Digital Energy Conference in Houston, mention is also made of the use of a ‘semantic web (ontology) to capture work flow use cases.’ And in our report from the Houston chapter of the POSC Caesar association we hear again from the mother of all oil industry semantic efforts, the ISO 15926 reference data library.

But what really sparked off this editorial was a presentation by Tim Berners-Lee to the 2009 Technology, Entertainment, Design (TED) conference in Long Beach, California last month. No, I wasn’t there, but the magic of the webcast2 allows me to summarize what TBL told the entertainers.

TBL, ‘I invented the World Wide Web,’ made a passionate plea—asking for help to ‘reframe’ the web. TBL described how, when working at CERN3, he had some difficulty selling the project to management. The idea was hard to explain before the web came into existence. TBL had to get folks to imagine what it could be like. Some did, and the rest is history. Now nobody thinks twice about putting their documents online.

Today, TBL is on another mission. He wants people to put their data on the web. The idea is that once there is lots of data out there, it can be mashed-up to provide insights and innovation. TBL invited the TED audience to reflect on ‘a world of linked data.’ The idea is very simple. Just as a web document has an ‘http name,’ the concept can be extended to data. Thus people, things, events all have ‘http names.’ Users will then be able to retrieve standard formatted data from a ‘thing’ and will be able to derive useful relationships. Linked data allows for instance, a person born in Berlin to be linked to data about the city—which is just another ‘thing’ with more http relationships. And that is all there is to it! TBL noted en passant that president Obama has said that US government data will be on the internet. He hopes that it will be deployed as ‘linked data.’ Data from business, scientists, about events, and talks is all amenable to linking. TBL wants to ‘make the world run better by making data available and by avoiding database hugging.’ OK, you can craft a beautiful website, but first, ‘give us your unadulterated raw data.’ TBL had the TED crowd chanting ‘raw data now!’ enthusiastically.

I have to say that I was impressed by the clarity of TBL’s presentation. But I was equally surprised that it included no mention of the semantic web! Could it be that TBL is back tracking from the ‘semantic’ positioning of his new web? I mean ‘put your data online’ is quite a different, and infinitely more understandable suggestion than the ‘ontologies,’ ‘reasoning’ and ‘meaning’ of the semantic web. The very word ‘semantic’ implied that there was to be some kind of machine based ‘understanding’ of text. To quote TBL from a much earlier presentation4, ‘[semantic] search engines will start indexes of assertions that might be useful for answering questions or finding justifications.’ Quite a different proposition from ‘raw data now!’

Perhaps the ‘semantic’ side of the new web was a red herring. After all, if the intent was just to get as much data available as possible, the W3C might have encouraged data owners to offer an easier way of exposing their databases with a simple http version of SQL or something along the lines of the one line ‘APIs’ that retrieve Google or Amazon data. Instead the W3C elected, not just to encourage data to be made available, but also to specify the RDF5 protocol, which was to make it more amenable to analysis, semantic or otherwise.

This was probably a mistake. If you are encouraging third parties to put their data in the public domain, then imposing—even asking for, a particular format, is a burden too far—and an excuse for not doing anything. The situation reminds me of an upstream data project whose objective was not dissimilar to TBL’s. An EU government decided to open up exploration and told the incumbent National Oil Company to put bags of its data in the public domain. The NOC then proceeded to acquire hardware (the StorageTek tape robotics were particularly impressive) and develop software, using a comprehensive and incomprehensible data model. After several years, public data delivery was minimal, governments changed and it was business as usual. The lesson? It’s an IT classic—too much focus on the technology and not enough on the ‘business.’

How does the ‘business case’ of linked data apply to oil and gas? Where are all the massive public data sets of interest to oil and gas? There aren’t any in RDF although there are many potential candidates from institutions like the MMS, CDA and Diskos.

So if it’s not about linked public data, why the interest in ‘semantics?’ The reason is that with web access, enterprise IT is now a microcosm of the world wide web. Data, which folks inside the firewall do want to link to is held in multiple, incompatible sources and applications. In this context, the semantic web’s RDF is experiencing modest take up as a lingua franca for master data management. Disparate data sources can be remapped to RDF and the ‘semantic’ tools are used to bring it all together. Is this the best way of achieving what should be quite a simple task? I really don’t know!

1 Society of Petroleum Engineers.

2 www.oilit.com/links/0903_1.

3 EU nuclear research establishment.

4 In TBL’s foreword to Spinning the Semantic Web, MIT Press 2003, but referring to a talk given in 1997!

5 Resource Description Format.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.