1996 in retrospect and some resolutions to make 1997 a vintage year. (January 1997)

PDM’s editor Neil McNaughton thumbs through his 1996 jottings to provide some lessons from the past year and to suggest some New Year resolutions.

While PDM cannot yet claim a years existence, we may as well jump on the bandwagon of the end-of-year festivities and reflection and try something along the lines of "1996, how was it for you?" and perhaps suggest some new year's resolutions for the data manager. Now trying to turn data management into a festive subject, or even lending it a festive slant is not the easiest of tasks, but we'll try. We will also profit from our short life to date to extend our rearward look to encompass in fact the whole history of data, managed or otherwise from the beginning of time.



Data is funny stuff, most of its uses are not what it was intended for. What may have been recorded as the definitive survey over the hottest prospect in the best basin, say 20 years ago is almost certainly just taking up space today. If the data is lucky, its space may be that of an air conditioned vault with smoke detectors and inert gas fire control systems. Less lucky data may be taking up space in a salt mine, perhaps alongside containers of low level nuclear waste. Data whose luck has finally run out may be "re-cycled", burned or still taking up space, but in a landfill site. It could be argued that these last categories are, in reality, the only ones where data is actually serving a useful purpose. So why do we hang on to data? Some data types, such as cores, while undoubtedly making the best candidate for landfill, do have a special, permanent raison d'Ítre. To acquire them anew would cost a lot, as in the case of new seismics, but unlike seismics, they do not date, you would not expect a new core to give the orders of magnitude improvement one sees from a spanking new 1000 trace 3D survey. So how about this for the first PDM new years resolution. Lets integrate the landfill site into our corporate workflow (note the essential jargon rendering credible and sweetening a bitter pill). It doesn't matter what that 1976 survey was worth when it was acquired, if you have more recent data, or a depleted oilfield or anything that makes you sure you will never use it again, then throw the stuff away!


dead wood

Now lets work forward through the legacy data chain - with a slightly different approach. If data - and I am thinking particularly about interpretation results, maps, picks, backups etc. - have not been rigorously indexed and stored in a manner that will allow meaningful re-use to be made of them, then throw them away too. This is not an entirely negative exercise. If you do not adopt a radical approach to trimming the dead branches, you will soon not be able to see the wood for the trees. Searches will bring back so much junk that finding the valid datasets amongst the dross will become impossible.

Next resolution is just the corollary of the first. Implement a policy governing what data is to be kept and how it is to be indexed. Be radical, keep only top level interpreted data, throw away all those backups, intermediate processing tapes, velocity analyses. All that stuff that you just know will never be looked at again. Be even more radical, tell your contractors, no thanks, we don't want all those intermediate tapes and paper. After all if you don't QC it when it's done, what's the point in being able to find out where it was done wrong when its too late? To my knowledge, no-one has ever been to court over a mis-picked velocity analysis!



If you are not sure about any particular type of data, there is an acid test to se whether data is worth anything, try to sell it! Offer some samples of your legacy data to a broker. If he turns up his nose at it, then junk it! But please please, under no circumstances give it away to a university. They might just accept it (they usually are desperate for real world data) and then another generation of students will, like I did, have a totally distorted and out-of-date picture of what the data is like out of academia. Having cleared out your cupboards you will be able to turn to another source of material which in the mean time is attempting to fill them up again. This is the Brave New World of simulated data. If we were to thing of the most solid, reliable data that is generally acquired, we would have to think of a core (again). Close behind would be a 3D seismic dataset and so on through what I propose to call the silliness spectrum, measured by the Silliness Index (SI). The core then has an SI of 0, and something really silly, like a geostatistical simulation of inter-well pore space would naturally have an SI of 1 (or 100%).


bit bucket

More and more of our data has a very high SI as our trendy technology allows us to visualise, model and manipulate the hypothetical. The head of research of a major oil co. has been touring the world telling eager researchers that the future of the reservoir lies in our being able to wander through the pore space wearing a Virtual Reality (VR) headset. The space will be generated by stochastic simulation of course, i.e. made up. What does this mean to the guardian of the data warehouse? It means that data storage should be dependent on the SI. Data with an SI in excess of around 80% should be stored in the bit bucket (/dev/null, UNIX's ready made "virtual data store"). Above 50% we might like to consider storing some rules for generating the data, less than 50% SI data will be admitted for cleansing prior to storage.


cleanse thy data!

Which leads us to the next resolution. Thy data shall be cleansed! Having attended many conferences concerned with data management, and having seen behind the scenes at quite a few data loading and transcription shops, I can tell you a secret. The data management community likes to talk about data models, but the real problem of data today is not that it is un modelled, but that it is unclean. A scan through back issues of PDM (there are six as of today!) will show how issues such as entitlements, line names, well names etc. are the real banes of the data manager's life. The world would certainly be a better place if all our data was correctly named and indexed in one big flat file than it is in today's all singing and dancing relational world.


know thy formats

Next resolution, know thy formats. It doesn't really matter whether formats are standard or not. Everyone knows that a standard (say SEG-Y) is really just a theme upon which the vendors and contractor's musicians will produce their variations. This doesn't really matter so long as you know what they are. Get formats from application vendors and seismic acquisition contractors and store them along with an example of a data dump in ASCII on a 3 1/4" floppy. This will mean that when you need to know, or when someone else needs to know the information will be there. In this context a special warning about a new breed of formats which are hitting the streets. SEG's RODE, MADS and the API RP66 are all manifestations of a new super format type which can best be categorized as being Object Oriented.



These novel formats have the new hot property of Flexibility. Readers of our Quotes of the year piece in this PDM will spot the data manager's F-word and be warned. These formats need special treatment and extra information must be extracted from their authors unless you want your data to be completely lost to posterity in a few years.

Well I could go on but this is getting to be a bit of an unseasonable harangue. Most important, have a good year of successful data management. You should do, this is where the action's at, the oil price is up and data is everything except in short supply! May your relations be normal, your formats flourish and your bytes burgeon.

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@the-data-room.com with PDM_V_2.0_199701_3 as the subject.

© Oil IT Journal - all rights reserved.