Data integration is a hot topic these days and many tools have been developed for moving data. Usually, these tools are SQL scripts and flat file loaders. At InnerLogix we have been investigating the accuracy of both the data movers and the processes used for data integration.
Reference values
Much petroleum industry data relies on reference values. Measured depth values are referenced according to an elevation. Deviation surveys and well-paths refer to a north reference (grid north, true north, etc). All position values are based on a cartographic reference system (CRS). Much of our data is also unit dependent and will probably involve the infamous NULL value standards.
DataLogix
We have compared data from different data sources using our company’s DataLogix application. DataLogix automates data migration and integration and offers version control and QC. DataLogix also lets us compare data from different sources. In order to make the comparison meaningful the software first normalizes all the data elements to a set of common reference values. Many of the data sources for this study did not contain enough information to enable such normalization. For such data sources DataLogix allows this information to be supplied and associated with each data source.
People
We then tried to find people that knew how the source data was created. However, such people are not always still with the industry! Their knowledge has been lost and the value of many data sources is greatly reduced (some would say eliminated!). Using new technologies we were able to ‘reverse engineer’ many of these inconsistencies and restore the original value of the data.
QC
Where possible we perform various quality checks. DataLogix uses statistics, geo-statistics, business logic, and fuzzy logic algorithms for QC. Such checks revealed major flaws in existing data integration processes. We discovered well-paths that had simply been copied from one data-source to the next without regard to the north reference. Well locations that appeared identical were actually based on different CRSs. Internal consistency is also an issue. Over time, master data sources are updated with data corrections and new data items, but these updates are not always reflected in the child data-stores.
Diff-view
We used the ‘differential-view’ feature of the software to compare the child data stores with their parents and discovered that several key updates had not made it from parent to child. We also discovered that some data was corrected in the child data store, but these changes were not reflected back into the parent data set. Since the differential-view feature is editable, we were able to reconcile these data sources.
Synchronize
This study showed that some existing data integration processes do not accurately normalize data before migration. We also found that few processes are in place to ensure that data is synchronized between data stores. With the use of the right software tools these errors can be efficiently detected and corrected. More on DataLogix from www.innerlogix.com.
© Oil IT Journal - all rights reserved.