At last, a book that tells what the semantic web can do, rather than explaining how the world might be if, overnight everything on the web suddenly became ‘semantic.’ As stated in Programming the Semantic Web1’s (PtSW) introduction, significant research funds have been sunk into the semantic field and sometimes ‘the noise from the R&D community obscures the fact that this is not rocket science.’ You do not need to be into philosophy or artificial intelligence to use the semantic web.
Despite the ‘semantics’ name, the semweb is not about natural language. It is rather about sharing data between communities and machines. Thus the starting point is data modeling in fact perhaps our main learning from PtSW is that RDF/OWL is a data modeling language—with no magical powers beyond SQL or Express.
PtSW kicks off with a limpid explanation of the path from tabular data through relational databases, and how hard these are to maintain and evolve. While database schema complexity is manageable in well understood industries, it gets somewhat intractable when there are ‘hundreds or thousands of datasets all talking about different things.’ It would be hard and labor intensive to put all the world’s data into a single schema—i.e. the RDBMS suffers from poor ‘scaling to complexity.’
Enter the ‘key/value’ schema—a more flexible data model that handily maps into the Resource Data Framework’s ‘triple.’ This is the ultimate stripped-down data model where relationships are described by the data itself. Before you know it (page 23) you are building a simple triple store in Python2.
Having built a few semantic silos—the next big thing is tying them together with shared keys and overlapping graphs. But nota bene—there is no magic here, the problems of inconsistent naming and mis-spelt data has not gone away. PtSW even has a new term for disambiguating multiple records of the same thing, ‘smutting.’
The section on ‘Just enough OWL’ probably would not satisfy the purist but the idea is that OWL brings back the stuff we threw away when we abandoned the RDBMS, relations and data modeling. But modeling anything of moderate complexity in RDF/OWL quickly becomes ugly. Enter Protégé, an industry standard tool for managing such complexity.
If you are interested in understanding the technology that underpins the ISO 15926 standard (see page 3) this is a great starting point. But there is a caveat. The stripped down data modeling functionality of the semantic web only moves the complexity in your data into the Protégé model. Even simple objects translate into ‘graphs’ of mind numbing complexity. Determining even simple stuff like a unit of measure involves a trip around a graph. RDF is not really conducive to encapsulating data into ‘building bricks.’ Those involved in process modeling might like to check out OntoCAPE. More from www.semprog.com.
1 Programming the Semantic Web, Segaran et al. O’Reilly. ISBN 978 0 596 15381 6.
This article originally appeared in Oil IT Journal 2010 Issue # 7.
For more information or to comment on this topic email here.