DIY data management

In this first of a three part series, Garry Perrat from UK consultancy Geocon provides a primer in the use of the UNIX shell script. Garry shows how a little knowledge really does go a long way with a powerful scripting language and walks through some scripts which will be of immediate use to E&P project data managers. The code for this article available on the www.oilit.com website along with other code snippets.

It is said that a little knowledge goes a long way and this is very true with Unix scripting. Many system administration and data management tasks can be simplified with a few shell scripts, often only a line or two long. In this first article we will consider some simple scripts to handle common tasks, introducing various features to improve their utility. A reasonable knowledge of general Unix commands (ls, grep, an editor) is assumed.

What is a Script?

A script is just an executable text file containing Unix commands just like a PC batch file contains DOS commands. Scripts can include commands you wouldn't normally use on the command line (if loops, condition testing, etc.) but the simplest ones just contain normal commands. For example, do you get fed up typing certain long-winded commands time after time? Project file listings, perhaps, like:

 ls -l /disk*/projects/myproj/*.sgy

It gets even more complicated when some of your project directories are on filesystems with completely different names and you have something like:

ls -l /disk*/projects/myproj/*.sgy /nobackup*/projects/myproj/*.sgy

That’s quite a long command line with plenty of scope for typos so why not put it into a script? Just create a text file called, say, "filelist" containing that single line, save it and make it executable:

chmod +x filelist

You can then type

./filelist

to get your listing as long as you are in the directory where the file has been saved. That’s saved you quite a bit of typing f you often run that command. But what if you want to list another filetype and/or project?

Your Flexible Friend

You could, of course, edit the file each time but that could be rather tedious so instead change the script into a more generic form:

ls -l /disk*/projects/$1/*.$2 /nobackup*/projects/$1/*.$2

"$1" and "$2" are variables which refer to the first and econd arguments given to the script. So you can now type something ike

filelist myproj sgy

There is now slightly more typing involved but you(or anyone else) can use it for any file type in any project. It could be made even more flexible (to handle multiple filetypes simultaneously, for example) but will leave such complexities for now.

Command Search Path

If you are not working in the directory in which you have created the script you have to specify the full path to it, for example

/users/fred/filelist myproj sgy

This could get rather tedious as it isn't actually saving much typing. However, Unix searches a series of directories for commands so you can either put your scripts in a directory already in the search path (the command

echo $PATH

will list them) or create a new directory for them (e.g.$HOME/scripts - that’s a subdirectory called scripts off your home directory) and add it to your search path. How you do this can vary but in general:

1) Determine which interactive shell you are using (echo $SHELL).

2) If you are running csh (the C-shell) you will probably need to edit a file in your home directory called .cshrc. If sh (Bourne), ksh (Korn) or bash (Bourne Again) it will probably be .profile.

3) Look for a line in the appropriate file containing "path=", PATH=" or similar, followed by a list of directories and add your new directory (e.g. $HOME/scripts) onto the end. Follow the syntax for the command, honouring any colons or whitespace between directory names as well as parentheses.

4) Log out and back in again or open a new window to see the effect of the change. Test it by typing just

filelist myproj 3dv 

from a directory other than the one containing the command. An error message like "Command not found" means that it hasn't worked. If you get unexpected output it’s because there is another command of the same name already on the system so rename your script to something distinct. Don't call any of your scripts by the same name as an existing Unix command (like test)! Generally, which can be used to determine whether or not a command name is already used. For example, which ls returns "/bin/ls" but which filelist will return either something like "/users/fred/scripts/filelist" if you have already created it or "no filelist in ..." then a list of all the directories in the command search path. (Some commands may return "aliased to ..." - don't use these aliases for scripts, either.) If you are running the C-shell and don't log out before running a new script you will need to run rehash in the window in which you want to run the script or the system won't find it in the search path. If in doubt consult your local system guru if you are lucky enough to have one.

Variables

If you want to edit the script a year or two after originally writing it, or copy and change someone else’s, a well laid out one makes the job much easier. So let’s clarify ours before it gets too complex:

#!/bin/sh PROJ=$1 # Project is 1st argument
FTYPE=$2 # File type is 2nd argument
ls -l /disk*/projects/$PROJ/*.$FTYPE /nobackup*/projects/$PROJ/*.$FTYPE

The first line forces the Bourne shell to be used to interpret the script. This is normally the default script interpreter (or the Korn which is a superset of the Bourne) but it always pays to err on the side of caution since C-shell syntax is different. Note that any shell can be used to interpret scripts, regardless of the one used interactively in windows. The Bourne and Korn are used more commonly than the C for scripting. If this doesn't mean anything to you then pretend you didn't read it as it really doesn't matter! Blank lines are not necessary but improve clarity. The second and third lines define variables: PROJ is set to the first command line argument and FTYPE to the second. Note that they are preceded with a dollar sign when you want to reference what’s in them but not when you are assigning something to them. (Think of it as paying to see what’s in them!) Variables don't need to be uppercase and, indeed, these ones should strictly speaking be lowercase but I find that uppercase variables greatly improve readability if you're not worried about niceties.

comments please!

Everything from a hash until the end of the line is a comment unless the hash is the first character on the first line. Commenting your code is (almost) always A Good Thing. We now have a useful script, written clearly and self-documented. Next time we will look at some more complex building blocks to improve scripts’ functionality. In the meantime, some more examples are on the Web at http://www.oilit.com

Editor’s notes

Our lawyers insist that we disclaim any responsibility for the use of the code snippets provided here and on the oilIT.com website. All code is provided ‘as is’ and no guarantee for fitness for purpose is implied either by The Data Room or by Geocon. Make sure you back up any critical files before running any script on them.

About the author

After nine years of geohackery at Phillips Petroleum, Garry moved to Geocon, to whose clients he now provides Landmark & system support, Unix training, scripting and other such services. For more information call +44 (0)1297 34656, visit www.geocon.com or email gperratt@geocon.co.uk.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.