XML and legacy data (August 1999)

PDM wonders if the plethoric SEG-Y standard formats could benefit from an XML treatment and concludes that while an SEGML flavor of XML might be an option for remastering, it is no panacea for legacy data.

XML is not exactly alone in competing for our attentions. In the field of software interoperability alone we have heard similar offerings from Landmark and Microsoft with the COM for UNIX initiative of last year, CORBA is currently the preferred glue for Open Spirit and other E&P initiatives. In IT at large, COM is a route to software interoperability within in the Microsoft camp, but it is complicated, and applications need to know a lot about each other's behavior and data models. CORBA plays a similar role in the UNIX environment, and suffers from the same constraints.

KISS principle

XML is designed to operate on the KISS principle - "keep it simple, stupid". The idea is to agree on a chunk of exchangeable data - say an article code, description, cost and availability. Note especially the agreement part - XML is extensible, and this is potentially a dangerous thing. The danger should be avoided by agreement at the schema level. Open an HTML document and you will see <!doctype html public "-//w3c//dtd html 3.2//en" >. This is a pointer to the definition of the HTML language. It rarely changes, and so an application does not need to use this information. Not so for XML. An Office 2000 Microsoft Word document saved as html starts out with stuff like <html xmlns:o="urn:schemas-microsoft-com:office:office">. This points to the extended XML schema containing the tag definitions for Office2000.

SEG-Y

Lets walk through a familiar problem and see how XML could be applied to a solution. The venerable SEG-Y format for seismic data has suffered over the years from inconsistent use, and a desire to stuff more and more information into the format than was catered for in the initial specification. The latest attempt at re-defining the standard has met with limited acceptance, because change, of any sort is anathema to the installed base. SEG-Y is therefore a prima facie candidate for what the French would call a "re-looking la XML".

slug-out

Instead of a committee slugging it out over the actual data format itself, agreement would be at the schema description level, with accepted tag definitions. Different flavors of SEG-Y could then all be current, with their own detailed schemas posted on the web. Now the interesting stuff starts. Instead of struggling to understand how the details of a foreign SEG-Y format slot together, you just load the tape, and press 'read'.

push-button?

The XML enabled SEG-Y reader then reads the URL address of the schema, visits the website where the schema details are housed, downloads the schema and uses it to read the tape. Of course this strategy would be no good for reading untagged legacy SEG-Y. But there may be benefit in considering such a strategy for remastering large volumes of old but standard data into a new, highly portable XML based format. That at least is the theory. The key point here is that the schemas must a) follow an accepted domain specific usage, and b) must be public. If you are offered 'XML based' products which do not follow these rules you may not be getting the full benefits of the standard.

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@oilit.com with PDM_V_3.3_9908_16 as the subject.

© Oil IT Journal - all rights reserved.