Getting data right. Don't model, argue!

O'Reilly’s free book advocates abandoning data modeling in favor of Tamr’s DataOps.

Getting Data Right (GDR), tackling the challenges of big data volume and variety is 77 page free book apparently without sponsorship so where’s the catch? We looked the gift horse in mouth and noted that the GDR is published by Tamr. Otherwise the seven co-authors provide an interesting narrative around big and not so big enterprise data. Tom Davenport immodestly proposes his own ‘law’ thus, ‘The more an organization knows about a particular business entity, the less likely it is to agree on a common term and meaning for it.’ This leads to ‘corollaries,’ viz, ‘A manager’s passion for a particular definition of a term will not be quenched by a data model specifying an alternative definition,’ and ‘consensus on the meaning of a term is achieved not by data architecture, but by data arguing. ‘Data modeling doesn’t often lead to dialog, because it’s not comprehensible to most nontechnical people.’

Other authors offer similar pithy, if hard to execute, advice: Michael Stonebraker on data curation and the data lake where we learn that data ingestion to the lake involves converting XML to CSV (and that’s a good thing?) Ihab Ilyas advocates ‘humans in the data cleansing loop’ and the ‘judicious involvement of experts’ and reference sources. Michael Brodie writes on the ‘fourth paradigm’ of data science with an interesting use case of CERN’s Atlas project. Andy Palmer introduces the next big thing ‘DataOps.’ James Markarian advocates replacing sluggish old ETL with snazzy ‘data unification’ à la Tamr. So there was a catch, but quite an interesting read all the same!

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.