Fuzzy logic, statistics aid data integration

Dag Heggelund, CTO with Innerlogix, reports back from the data integration front line in this contributed article. Heggelund describes some typical pitfalls in data migration and offers some suggestions as to how to minimize data integration errors.

Data integration is a hot topic these days and many tools have been developed for moving data. Usually, these tools are SQL scripts and flat file loaders. At InnerLogix we have been investigating the accuracy of both the data movers and the processes used for data integration.

Reference values

Much petroleum industry data relies on reference values. Measured depth values are referenced according to an elevation. Deviation surveys and well-paths refer to a north reference (grid north, true north, etc). All position values are based on a cartographic reference system (CRS). Much of our data is also unit dependent and will probably involve the infamous NULL value standards.

DataLogix

We have compared data from different data sources using our company’s DataLogix application. DataLogix automates data migration and integration and offers version control and QC. DataLogix also lets us compare data from different sources. In order to make the comparison meaningful the software first normalizes all the data elements to a set of common reference values. Many of the data sources for this study did not contain enough information to enable such normalization. For such data sources DataLogix allows this information to be supplied and associated with each data source.

People

We then tried to find people that knew how the source data was created. However, such people are not always still with the industry! Their knowledge has been lost and the value of many data sources is greatly reduced (some would say eliminated!). Using new technologies we were able to ‘reverse engineer’ many of these inconsistencies and restore the original value of the data.

QC

Where possible we perform various quality checks. DataLogix uses statistics, geo-statistics, business logic, and fuzzy logic algorithms for QC. Such checks revealed major flaws in existing data integration processes. We discovered well-paths that had simply been copied from one data-source to the next without regard to the north reference. Well locations that appeared identical were actually based on different CRSs. Internal consistency is also an issue. Over time, master data sources are updated with data corrections and new data items, but these updates are not always reflected in the child data-stores.

Diff-view

We used the ‘differential-view’ feature of the software to compare the child data stores with their parents and discovered that several key updates had not made it from parent to child. We also discovered that some data was corrected in the child data store, but these changes were not reflected back into the parent data set. Since the differential-view feature is editable, we were able to reconcile these data sources.

Synchronize

This study showed that some existing data integration processes do not accurately normalize data before migration. We also found that few processes are in place to ensure that data is synchronized between data stores. With the use of the right software tools these errors can be efficiently detected and corrected. More on DataLogix from www.innerlogix.com.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.