Data validation—the next big thing, really!

In guise of an editorial, this month we bring you a short email exchange between Oil IT Journal editor Neil McNaughton and two leading members of the WITSML community. The subject—an investigation into how XML,SOAP etc. can be used to assure data QC on-the-fly.

From: Neil McNaughton

Subject: Data validation and quality.
John, I am giving a paper in a few weeks on the next 10 years of data management and plan to include WITSML and its siblings in my talk. I have a couple of questions on the degree to which XML/SOAP can actually enhance data quality by offering on-the-fly data validation. So here are some questions for you. First, how important is data validation to the WITSML community? It is not generally mentioned in WITSML presentations. Is this because it is overlooked, or too obvious to be worth mentioning? Next, does XML/SOAP guarantee data validated against a schema? How far can this be taken? I gather that you can test data (for instance units of measure) against a list of values. But can you build business rules into SOAP/XML to check for instance that if a well is P&A’d, it also has a spud date?

Behind these questions is a concern that SOAP/XML may not be fully used to do on-the-fly data QC. Perhaps constant validation will be seen as too much trouble. Rather like using constraints when loading data into a database. This has been a common practice in the past and one that has resulted in poor data quality.


. . .

From: John Shields (Baker Inteq)

Neil, Good to hear from you again. Data validation is a very important part of the business of the companies involved in WITSML. The majority of the data flow is from wellsite providers to office based repositories. It is normally a contractual responsibility of the wellsite acquisition companies to provide data values that are valid and as accurate as possible. The validation of data is normally carried out within the software of the acquisition systems where it is possible to check for bad sensor values or out of range measurements for different measurement units. WITSML is just the transfer mechanism and not responsible for the accuracy of the numerical measurements.

Within WITSML however, there are a number of things that can be used to validate the WITSML data messages. An XML data file or document can be described by an XML schema as is the case for WITSML. The XML schema can perform fairly sophisticated validation of the structure and type of data contained within an XML message including: specifying if elements are mandatory, the number of occurrences of an element, checking for valid entries in enumerated lists, specifying the length of a data string and checking numeric data types (integer, float etc.). Most programming languages provide the facility to validate an XML document against its schema and to be able to analyze any errors reported. WITSML also defines schemas for units of measure classes so that validation can check that valid units have been used in the XML documents. Typically, WITSML objects would need to be validated as they are received into a WITSML store. In this scenario it would not normally be too time consuming or processing intensive a task to perform validation checking by means of the XML schema file or a business rules style sheet. Interesting stuff!


. . .

From: Rune Skarbø (Sense Intellifield)

Neil, Until recently, focus has been on achieving seamless streaming or replication of data between WITSML systems, although some work has been done using WITSML (XML/SOAP) to enhance and QC data. Now that interoperability between systems is in place, the WITSML community is able to focus more on using the XML data as a basis for data QC, advanced algorithms, smart alarms, etc. Although the WITSML schemas provide some level of assurance that data conforms to certain rules, we need business logic to take full advantage of the data. Here are a few examples:

1) Operators often require service companies to provide real-time data using defined units of measure (UOM) and may provide contractors with lists of accepted curves, mnemonics and UOMs. These can be specified in our SiteCom WITSML application’s business logic module, so that incoming data is validated as it arrives. If something does not match, an alarm is raised—or if preferred, units can be automatically corrected to the predefined format.

2) The same business logic module can check for spikes, rates of change, or values exceeding specified boundaries and perform processing on the data, making it available in near real-time.

3) Although specific data validation rules or business logic has not been specified in WITSML, this does not mean they cannot be implemented. SiteCom automatically reads the incoming data, runs data QC, and then writes the updated data back to SiteCom. Users can receive a master log, consisting of data from different parts of the well, in both QC’d and raw formats.

4) As the essence of WITSML is for the standardization of data formats, one may see ‘process driven data QC’. One example may be a common portal for well geometry, where high resolution planned drillstring and hole geometry are loaded from the well planning application. All applications using the geometry data go to the portal to get this instead of today’s practice of entering it all manually. As the string runs in the hole, an electronic tally book updates the portal automatically. Typical applications are cementing, hydraulics, torque and drag, completion, and casing.

5) There are many other examples where algorithms operate on the WITSML data as it arrives to perform real-time torque and drag, drillstring integrity, real-time event recognition (i.e. what’s going on at the well site) and drilling efficiency. They key is that it does not matter which service company is being used. As long as the data conforms to the WITSML schemas, it is possible to implement generic algorithms that work independently of the data source.


Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.