PDM Editorial - are things really getting better? (January 1998)

Neil McNaughton discusses a remasteringproject that The Data Room was involved in recently and shows how in IT, even the simplesttask can become complex. The best laid plans can come unstuck through details such as whatcharacters are allowed in a file name. Rather than offering a new standard solution, hesuggests that a little education might go a long way.

The impression given at the SMi E&P Data Management and Data Repositories conference held in London this month was that the opinion people hold on data management depends on which sector of the industry they belong to. If you are a vendor of data management software or services - then things are on the up, the problems are being solved. If you are an oil company client or consultant - actually working on the shop floor then the improvement is sometimes harder to discern. Two client side presentations at the SMi conference - from Tim Bird of Enterprise and Mairead Boland of Shell illustrated this point and a summary of their papers is given elsewhere in this PDM. The common experience here is that while the data management has been solved on a conceptual level, somehow it is just not being put into practice. I'd like to align myself with the client side viewpoint on this one, chip in with a few anecdotes of my own and try and come up with, well I'm not sure what, certainly not an answer.

Devil in detail

As a starting point, lets take the havoc wreaked by the failure to respect some simple conventions. I would like to relate a recent personal experience. This was an outsourcing project that involved processing a very large number of legacy well logs and transcribing them to High Density Media. The details are not important, it is the devil therein that is. First it is worth observing that when you are working with a contractor, you cannot go into his shop and expect him to re-engineer all his software to your corporate standards, unless you are a very big corporation and this is a very big contract. This may mean using some software and computing platforms which will not allow you to do what you want to do - even in the simple area of naming conventions. In the project in question, UNIX, VAX and PC hardware was in use. Unfortunately, you cannot expect much cooperation from the hardware in such an environment. A slash "/" maybe OK in a file name in one system, but would become a new directory in another. The PC world, at least in its DOS and Windows 3.1x incarnations will only allow you to store 8 plus 3 characters in a file name. Yet another constraint was the acceptance or otherwise of whitespace in file names across the different operating systems. All of which meant that the well names changed three times in this one project as files were read in processed and output. As I said, short of re-engineering the whole process, nothing could be done to avoid these operating system quirks so we just had to live with them. The important issue here was to be aware of the problem, to try to achieve a co-operative way of working with the contractor and to adopt an agreed work around at each stage of the process. In short to develop a more or less formal procedure for the work in hand. One might expect that this would be the normal way of doing business for any contractor, but while contractors will have procedures for doing the most commonly executed tasks, these may be constrained for reasons such as those outlined above. Elsewhere, no two datasets are alike, and in special processing no two jobs are alike. If you follow the procedures manual to the letter, you may not actually be doing what you want.

Human error

Another recognized cause of error is the mis-typed entry, human error. Everyone agrees that this should be addressed by improved constraints on data entry with checks, limits and lists of values being performed by the database. But in the real world a lot of data is not even entered into a database - at least at the point of capture. Probably the most common tool used for data entry is the spreadsheet. This is a devilish invention indeed. You may have read in three thousand integer values into a database from a spreadsheet before you come across two values separated by a comma, or a note from the operator to himself saying "can't read this". Anything goes in a spreadsheet because the is no intrinsic data integrity, no checks, no ranges no lists of values. Well actually this is not true and while the proper answer to how to use a spreadsheet is probably "don't", a reasonable compromise would be to investigate the data validation possibilities of your favorite tool. In MS Excel, areas of the spreadsheet can be set to pre-determined ranges of values, or restricted to values in a list. It is "just" a matter of educating people to use these functions. But the education bit is probably the hardest part, you may even have to educate your boss.

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@the-data-room.com with PDM_V_2.0_199801_3 as the subject.

© Oil IT Journal - all rights reserved.