Book Review—Developing high quality data models

Matthew West’s new book summarizes a lifetime’s work on data modeling in oil and other verticals.

The introduction to Mathew West’s new book, ‘Developing high quality data models¹’ (DHQDM) sketches out his career from early work computerizing Shell’s refineries and later, as data architect for Shell’s 1990 ‘Key Thrust’ on data management. In 1993 West joined the Process Industries Step consortium Pistep to work on a standard for engineering design data handover. This led to an EU ‘consortium of consortia’, Epistle, with West as chair.

There remained ‘niggling’ problems with the Epistle data model. Epistle, like Shell’s earlier models was a static snapshot of part of the enterprise and neglected the fourth dimension, time. Another learning was that West was not developing a data model but instead, an ‘ontology.’ The use of the 4D/ontological approach underpinned Shell’s spin-out Kalido and also a 2004 Shell global program to develop IT systems for its downstream operations. Epistle itself eventually forked into the ISO 15926 standard for lifecycle integration of process and oil and gas production facility data.

DHQDM builds on West’s vast experience but a warning is necessary. If you are looking for a book that walks you through what West calls the ‘traditional normalization approach’ you will be disappointed. West’s starting point is the Express data modeling language² and its graphical manifestation Express-G. A lot of this is rather hard going. Despite his emphasis on the importance of data definitions, West doesn’t do a great job of defining terms like entity, entity type, entity data type, instance, class and so on. If you have no idea what these are, this book is not for you. If you do, lets hope it is the same as West’s. DHQDM also suffers from too granular and intrusive paragraphs—4.5.5, ‘Data quality standards’ is, for instance, a minute tautology.

West’s twofold thesis is that the 4D/ontological approach gives more robust models than conventional data modeling. The 4D issue is easy to grasp. West uses examples of stuff like feed stocks coming into a chemical plant and emerging as batches of product which in turn may be saleable as different products.

The ontological approach builds on philosophical notions of ‘meaning’ in a scheme for modeling everything—through time, space, the universe and in fact, other universes that may or may not exist! West’s framework was used to develop the ISO 15926 data model for integration of oil and gas equipment lifecycle integration and also for a conceptual data model for Shell’s downstream business.

This book explains clearly many of the pitfalls of real world data modeling and how an ontological approach is used to get to the bottom of what is being modeled. There are many insightful use cases that should make data modelers pause for thought and seek a deeper meaning in what they are trying to achieve. But the framework is not for the faint hearted. Getting closer to ‘meaning’ means more abstraction from reality and tortuous constructs such as how a customer might become an ‘instance of a class_of_state_of_biological_system_component.’

In oil and gas data modeling Express is recognized as a powerful data modeling tool. Along with STEP and ISO 15926 it was used by Energistics (then POSC) to model the upstream in the Epicentre data model. But the rub is that Epicentre is practically no more and ISO 15926 appears to be migrating to the very different RDF/OWL modeling language. So for DHQDM to see take-up, an Express revival is required. Given the dominance of the ‘normalizers’ and the emergence of RDF/OWL this is a tough call. But West’s advice and understanding of modeling is applicable across all environments. Without using these words, West appears to be saying, ‘This is the way to do data modeling properly and Express is the best tool for the job.’ In an ideal world, the power of Express appears to offer a lot. But the ideal world is, as it were, only one of many.

¹ Morgan Kaufmann, ISBN 9780123751065, 2011— www.oilit.com/links/1101_13.

² www.oilit.com/links/1101_12.

This article originally appeared in Oil IT Journal 2011 Issue # 1.

For more information or to comment on this topic email here.