Data migration Cook book

ETL Solutions data guru advocates smart documentation with open source tools.

Speaking at the PPDM data management symposium late last year Richard Cook (ETL Solutions) advocated documenting the data resource. The recent joint Energistics/PPDM effort on mapping between the Witsml and PPDM data models (more in next month’s Oil IT Journal) has provided a comprehensive mapping document. But in general, copying and pasting into Excel is error prone and lacks automated checks for typos and other defects. While the approach may be valid for mapping between such ‘stable’ data models, more volatile models require better tools, possibly using source documentation to extract mapping information programmatically.

In general, most PPDM implementations are a subset of the full model. The approach can be used to document exactly what subset has been deployed, saving time and revealing non-standard usage. Even when based on a standard, accurate and clear information on the model is key to efficient use and maintenance.

Cook wound up with a run through some useful tools. The H2 database is a small, well supported open source database capable of holding the complete PPDM schema with constraints enabled. ETL has successfully loaded the Teapot Dome sample data set to H2 for experimentation ‘without the need for IT support or a database administrator.’

Turning to the documentation question, throwing hundreds of pages of static documentation to new hires will likely be counterproductive. It is much better to use innovative tools such as interactive Python.

This can be used to develop a ‘Wiki’ notebook for interactive programming with links to PPDM and other data models via Python. This kind of environment can pull metadata directly from the model and provide live, data-driven documentation showing up the subtle differences between given implementations. Read Cook’s presentation and visit ETL Solutions.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.