Why don’t contractors supply data in workstation formats? (June 1997)

PDM Editor Neil McNaughton relates the recent EAGE meeting in Geneva and discusses the ‘cost of complexity’ – having presented a paper on this very topic himself.

I saw an ad in a magazine many years ago for a get rich quick scheme or some such, the wording of the ad suggested that the idea was being proffered as a service to humanity, nonetheless, a $5 charge was being levied to "eliminate the frivolous". Well holding a convention in Geneva is a pretty good way of eliminating the "frivolous", in the form of the job-seekers, the retired and even the smaller stall-holders. Worse still, none of my regular hospitality-hunting drinking partners could afford to come. Only 2500 attended the annual conference of the European Association of Geoscientists and Engineers (erstwhile EAEG) and even the "big" stall-holders, the I/Os, Geco-Praklas etc. had what seemed like miniature versions of their usual padded emporia. While vendors reported fewer contacts than in recent years, they noted higher quality contacts. The frivolous were gone, and the E&P big spenders were back!

mismatch

As in previous years, there is an impedance mismatch between the conference itself and the exhibition floor - as far as data management goes at least. Our subject of predilection is all but ignored by the main conference, but occupies the front of stage at the exhibition. As one of a couple of foolhardy individuals, who did actually present a paper on data management, I would like to share some of my ramblings with you. What is data management? We tend to see data management in terms of applications, but behind each application there are users, and each of them may well have a very different idea of what makes up important data. Thus data management to an IT professional is all disk space, software licenses and network bandwidth. To his colleague the data base specialist, its is all about clean data and database integrity.

Holy Grail

To seismologists, it may look more like formats and data loading, while geologists may just want to know where there core is. It is interesting to note that if you attend one of the very many specialized data management conferences which are springing up everywhere, then data management comes very near to being defined as software interoperability, with the desire to move data seamlessly around what has been described as the "whitespace" between these applications, using "best of breed" applications from different vendors. While this objective has become something of a Holy Grail of data management, we are today still very far from achieving it, and we will start by looking at why. We are children of history in E&P as everywhere else and what we have today reflects our past as much as any idealized present we might wish we lived in.

Past is key?

Our past, in E&P computing terms is very much influenced by acquisition. Data formats from the seismic and logging industries have been designed to acquire data in the field and to do so in as an efficient a manner as possible. This has led to some highly evolved formats which are complex in the extreme, which are also frequently customized by vendors and major oil companies so that while they may be good ways of writing lots of data to tape, there is a trade off in that managing them can be difficult and costly. To offer a simple but telling example of how this arises, consider multiplexed field seismic data. Data was recorded multiplexed for performance, but this is not a good data management format, indeed demultiplexing some of the older legacy seismics is a non-trivial task today.

Complexity

A more modern example of complexity for performance is to be found in the family of formats based on the API RP66 specification, which we will look at next. Having its origins in the Schlumberger wireline DLIS format which was offered up to the API as a general purpose data exchange format, became Recommended Practice 66 and has spread out into the areas such as, POSC, RODE, and Geoshare. Using these complex formats, we can with difficulty, move data from one media to another, encapsulate different objects on the same support, and do a whole lot of clever things. But there is one thing, which we can't do very well, and that is get our three D surveys from the acquisition contractor, or trade partner, quickly onto the workstation. This leads to my first question for the industry - why don’t contractors supply data in workstation formats?

Martian viewpoint?

If you had arrived from mars, or were just reborn as a business process consultant, this would surely be the first thing would hit you in the face as something to fix right away. Of course, many people and organizations have been working at related formatting and data exchange problems, but not perhaps with a real focus on this particular issue. Today, what we seem to be good at is recording data in the field, putting it into robots or on shelves, and preserving it. Using the data somehow got overlooked. The accompanying figure introduces a rather dubious pseudo-metric of the cost and complexity of data management solutions. The graph shows a rather exaggerated interpretation of the cost of managing the different formats and data models, which have been proposed. I make no apologies for the absence of scales and units, and even the ranking is subjective.

Free lunch?

This graph is just designed to underscore that in data management, as elsewhere, there is no such thing as a free lunch. Of course we have complex structures for a reason, generally performance or flexibility. But if the performance gain is in the field, or in a one-off transfer of data, then there may be no benefit in keeping the data in the same format through the rest of its lifecycle. The ranking of the different groups of formats is subjective, but if you believe that it has any value, it is interesting to sketch out some other projections of the cost space such as portability, ease of loading or application performance – all plotted against the cost of management. This is an even more subjective exercise so I will leave it to you to reflect on, but if you sketch out, for instance, the graph of performance versus complexity there are some apparent bad buys around. If you reflect on the difficulty we have today loading and maintaining clean data in the database, or the black magic involved in carving up a 3D dataset for a trade, you will appreciate that if the future holds a multiplicity of different RODE implementations, or relational projections of Epicentre, then things are going to be even more difficult to manage.

Rocket science

What is important to remember here too is the people involved in managing data are often IT professionals without years of experience of the seismic industry, or staff previously involved in drawing office functions who may actually have the experience, but not necessarily be prepared to write a Unix shell script to facilitate data loading. If you want to run your data management department with rocket scientists you may, it is up to you, but you will pay the cost of this complexity. Another example of possible excessive complexity can be observed in the current techniques of database deployment. Pretty well everywhere you will see the division of labors between the corporate database or data store, and the project databases. This again adds a level of complexity to the system, which may or may not be justified.

Tiers of joy?

Just to clear up a common misconception, this tiering, which comes from the commercial database world is not a "natural" or essential way of organizing data in other industries. It is done this way simply to avoid a heavy duty SQL query effectively stopping all the ATM machines linked to the bank, from functioning. In other words it is a compromise. In the E&P world we do not have any transactions, so why compromise? Other arguments have been advanced for multi (and sometimes very many) tiered deployment such as the need to be able to change data without "corrupting" the data store, or maintaining multiple values for an attribute. This may or may not be a real issue, personally I would rather see one "correct" attribute propagated throughout the database, as the result of an interpretation. In this context, it is interesting to see how Landmark position OpenExplorer. It can be used at the project database level, but equally can be used to assemble projects directly from the corporate datastore into the workstation. No middle tier at all, and a lot simpler.

SEG-Y revamp

I make no apologies for returning to a topic we touched upon in last month's PDM. That is the PESGB's initiative to re-vamp the SEG-Y standard. SEG-Y is the nearest thing we have to a workstation ready format but an aging specification, and a multiplicity of implementations mean that it badly needs re-vitalizing. Over the last couple of years the SEG has been trying to re activate the SEG-Y standards sub committee with no success. Probably because it is not a sufficiently glamorous topic. The PESGB effort is therefore timely, and the proposed link up with the work done on PetroBank and the NPD should ensure a quick start. Maybe we will have workstation ready data from our suppliers some time soon.

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@the-data-room.com with PDM_V_2.0_199706_3 as the subject.

Why don’t contractors supply data in workstation formats? (June 1997)

PDM Editor Neil McNaughton relates the recent EAGE meeting in Geneva and discusses the ‘cost of complexity’ – having presented a paper on this very topic himself.

Sign up for occasional emails and subscription information...