Learning the ABC of compression and its business value (December 1996)

PDM's editor attended a half day workshop on data compression at the Denver SEG. He learned a lot, but was intrigued by the gap between the proffered 'business benefits' and real-world applications of the technology.

In our post restructured universe, in order to get funding for anything, especially for research, it is now necessary to present a "business case" describing the financial benefits to be reaped. Since our industry has only recently emerged from the dark ages of being "just a science", and business benefits are still considered as being "a good thing", let me remind you of the fate that befell those who heeded the "business cases" for the South Sea Bubble, the Suez Canal and more recently the Channel Tunnel. Business cases are generally presented with a degree of spin, and comprise three elements. One - make out an over-bleak case for the status quo. Two - Overstate the ingenuity of the project, while understating the cost of implementation. Three - paint an excessively rosy picture of the future and the benefits to be accrued. I apologize for stating the obvious, but it is useful to regard the purpose of a "business case" as a means of extracting money from an investor, whether he is your boss or shareholder.

Business case?

In E&P research, before the adoption of the "business case" paradigm, decisions were based on an examination on a case by case basis of the intrinsic merit of a project, its likely fields of application, and much of the judgement was based on common sense and experience. What silly ways we had then! The business case in favor of compressing seismic data - as initially presented - was a simple one. As 3D recording uses higher and higher spatial sampling, and as the areal extent of a 3D survey is now often over a whole permit, seismic data volumes recorded during one survey are now often of the order of a terabyte. This is an awful lot of data to move, especially on "conventional" (i.e. non High Density Media (HDM) such as D3/NTP). The number of tapes involved create logistical problems and considerable cost. Furthermore, since "time is money", the time taken in transporting and manipulating the tapes can be translated into a cost weighting in the "before" business case, preparing the way for an even more spectacular saving. The idea behind compression initially was, record as much data as you want to some temporary storage on board ship, then perform some sophisticated on-board processing to compress this by a factor of 10, 100 - even 300 has been suggested, and then throw away the original data.

lossy

There are two types of compression, lossless and lossy. The former is the kind of compression that is used to send faxes, or binary files over the Internet. The latter is illustrated by the video clip data on CD-ROMs which is compressed in this way, leading to a rather poor image quality. One trick used in lossless compression is to identify repeat sequences in a text document, such that a sequence of 20 identical bytes (characters) would be sent as 20x - taking up two 2 bytes instead. This is termed run-length encoding. Another means of lossless compression, Huffman coding, involves ranking the characters in a text message in order of frequency and using short codes for the most frequent. This is rather like the way a single dot is used in Morse code to send the letter "E", while a Z is transmitted as, whatever, something longer I guess.

Huffman

As its name implies, lossless compression allows the original data to be completely recovered. If this worked with seismics, we would be in business. Unfortunately, it doesn't work - at least not very well. Because of the random nature of the seismic time series (which is good news for decon, but that's another story), lossless compression can only achieve a limited compression on raw seismic data. Western's Zejlko Jericevic showed that lossless compression of 30-50% could be achieved on data stored in internal numerical formats such as IEEE Float or 24 bit internal. This compression relies on "byte slicing" i.e. re-arranging the bytes in a computer word. Because the high order bytes change relatively slowly, they do offer some possibility of compression using for instance the Huffman coding mentioned above. This type of compression could well be useful in certain circumstances, but will not provide the dramatic time/space savings which were initially hoped for.

data loss

For a business case of any magnitude to be made out, compression has to be lossy, i.e. the compression process will involve some destruction of the original data. It will not be possible to recover the original trace data by reversing the compression process. This sounds bad, it is, potentially, and leads us on to a whole suite of arguments which have been used to justify this destruction of data. Most of these arguments compare the high volume and precision of field data with the low volume and low numerical precision of the data actually input to the workstation. They are all along the lines of - well other parts of the acquisition/processing/display chain mess up the data quite a bit, so why shouldn't we too! I cannot hide my feeling that this is not how we should be doing business, and that if processing, for instance is failing to preserve the whole of the recorded dynamic range, or that our workstation applications only use 8bits of color depth, then these should be regarded as opportunities for improvement, rather than lowest common denominators to which all our other processes should be driven down.

judicious use

At a half day workshop held at the SEG annual conference and exhibition held in Denver last month The first thing to emerge from this is that only the most oblique references were made to the jettisoning of original data. The business cases today are more subtle. One category involves the transmission of data from ship to shore - in order to achieve some pre-processing ahead of the arrival of the bulk of the data. Here the limit is that imposed by the bandwidth of data communications from a seismic acquisition vessel to the shore. These are still of the order of a few megabits/s range, far less than that required for data transmission of the whole dataset in any reasonable time frame. A judicious use of compression, combined with some thought to what data really has to be moved around, has given rise to an interesting processing methodology developed by Western. Processing power is left on the boat, but processing decisions are made by ground based personnel using limited transmitted data. This allows a considerable time savings, but the original data is kept in its entirety.

intensive

Another use of compression is to allow data manipulation intensive processing to be performed within a realistic time frame on multi-terabyte 3D datasets. While establishing migration parameters for 2D data, considerable to-ing and fro-ing between offset and cdp sorted data is performed. This is not computationally feasible on a big 3D survey, so enter compression. Do these sort-intensive tasks on compressed data, but process on the full dataset. One could go on with reasons to compress, for instance, there is today, but probably not for long, a sharp division between processing and interpretation. This is unnatural, and fits poorly into the asset management paradigm which is increasingly used. What does the interpreter do when during an ongoing development program , a well result comes in way off in depth. This was caused by a velocity "anomaly", and means that the migration velocities need to be changed and the whole post stack dataset regenerated on the fly. A potential candidate for processing with a limited dataset providing it will speed things up. So in steps compression.

Chevron

So how do you achieve these very high compression ratios - some as high of 300 times have been cited by Chevron Petroleum Technology Company (CPTC) notably, who announced an alliance with Landmark to "speed delivery of compression technology"? It would be beyond the call of duty for me to attempt to explain the intricacies of wavelet transforms. If you want to impress your colleagues Discrete Wavelet Transform (DWT) is the buzz-phrase to remember. Evolving from satellite imagery, via the JPEG compression used in video and the FBI, who use it for compressing fingerprint images, in the seismic arena this looks a lot like an f-k transform, but not quite. As in f-k filtering though, some decisions must be made as to what will be compressed, and what will be thrown away. While the DWT mathematics are designed to identify useful signal and to preserve it, the proof of the pudding is in the eating, and the results are quite impressive. The Chevron technology shows little difference in data compressed by up to 300 times! Of course what would be interesting would be to see some of the worst case results. I would suggest that pre-static corrected seismics in an area of rapidly varying surface conditions may make for a harder test of the method.

too slow

In any event, compression must be fit for purpose. As Vermeer (Schlumberger Cambridge Reserearch) states, the admissible compression will be much less for a shot record transmitted for QC, than for processing where at the very least, it would seem reasonable to apply a refraction mute. Vermeer also cautions compressors as to the effects of a single noisy shot record. Compression may smear the noisy record over neighboring records, necessitating a high level of data clean up before compression. While the foregoing has shown that considerable savings in data volumes are possible - and hence in disk space and RAM, this is not sufficient to make the case for compressing a really strong one. The missing link is performance. Much of the above business cases are as time critical as they are volume critical. This is where some compression algorithms show weakness. The time spent in compressing and decompressing data may outweigh the gain imparted by the reduced data volume. This has led some workers, to suggest processing in the compressed (wavelet) domain. An interesting idea, but perhaps there is a trade-off in terms of understanding what is going on. Most geophysicists - and I include myself - had enough problems mastering the frequency domain.

Zip it?

An important spin-off from considerations of compression is the necessity or otherwise to arrive at a standard for the exchange of compressed data. This would be necessary, if compression became a widespread technique, in order to preserve the possibility of acquiring and processing data with different contractors. Diller (Encore Software) suggested an elegant way of ducking the standards issue by compacting seismics with the equivalent of a self extracting .zip file. In other words, delivering the data with a self-executing de-compaction algorithm. To sum up, ompression is unlikely to give savings in data management. In fact if we need compressed data for certain compute intensive tasks, then our data management problems will be increased with potentially multiple sets of data compressed and uncompressed at various stages in the project life-cycle. The original data will be kept, because who knows what the future may be able to extract from it. The original business case for compression will be lying in the dust, but we will have some powerful new tools at our disposal for imaging the reservoir. That after all is a better case for compression than saving a few cubic feet of warehouse space!

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@the-data-room.com with PDM_V_2.0_199612_3 as the subject.

Learning the ABC of compression and its business value (December 1996)

PDM's editor attended a half day workshop on data compression at the Denver SEG. He learned a lot, but was intrigued by the gap between the proffered 'business benefits' and real-world applications of the technology.

Sign up for occasional emails and subscription information...