UNIX INTRODUCTION You will very likely do your scientific computing in a UNIX environment. UNIX is by far the most common operating system for the workstations that dominate today's scientific computing. There are many different versions of UNIX. In this class we will be using OS X on the Macs, which is an Apple version of UNIX. Some of us still work some on the Suns, which use Solaris (also UNIX). There is also a free version of UNIX, called LINUX (pronounced lynn' exs), that will run on PCs and has been getting a lot of attention lately. To use UNIX on the Macs, you will need to bring up a regular text-based window rather than the standard Mac windows. This can be done by running the "Terminal" program, which in normally in Applications/Utilities. We will assume in these notes that you are entering UNIX commands within such a window. In many cases you will need to have X11 also running in order to display graphics, etc. Thus I recommend that you always run X11 first, and then Terminal to create the UNIX text window that you will use. (X11 also has a text window but its scroll bar is not as nice as the one in the Terminal application). X11 can be downloaded for free from Apple if you do not already have it installed on your machine. A note about UNIX shells: UNIX comes in different flavors. We are going to assume that you are running what is called the C shell ("csh" or "tcsh"). However it is possible that your Mac is running something called "bash", which many people think is better than csh. The default C-shells for Macs changed from tcsh to bash between Jaguar and Panther (OS X 10.2 to 10.3). However because these notes are based on csh, you should make sure your Mac is running tcsh (if you think bash is better, than you probably are more of a UNIX expert than me and don't need to be taking this part of the course anyway!). To find out what you are running, just look at the top of the Terminal window and it will say. Alternatively, type printenv within your UNIX shell and it will tell you lots of interesting things, including what your SHELL is. To set things up so that you are always in tcsh, run the Terminal program and select "Preferences" from the pull-down menu under "Terminal" at the top of the screen. Select "Execute this command (specify complete path)" and enter /bin/tcsh in the little box. Then just close the "Terminal Preferences" box and restart Terminal. To learn more about UNIX shells on the Macs, check out: http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html I am by no means an expert on UNIX; probably there are several of you here that know much more than I do. I have learned only enough to get by and could benefit from learning more. So I'm just going to outline the basics here for the benefit of those students who have not been exposed much to UNIX. The UNIX operating system was originally designed to run on mainframe computers where security was a big issue. You don't want users to be able to delete other user's files or do other nasty things. So there are a lot of security features. The first of these is the login and the password. You will be assigned a login name and password. The first thing that you should do after you login is change your password so that only you know what it is. ***On the Macs, go to System Preferences and click on Accounts.*** To change your passwork on more traditional UNIX machines, use the command: passwd You should choose a password with numbers or special characters as well as letters. Do not choose a common word that can easily be guessed by outside hackers. The NetOps people here are very concerned about security issues and IGPP computers have been "attacked" several times already, although no damage has been done. Do not write your password down (like on a note next to your computer!) where it can be found. Do not tell other students your password. You can easily give them access to read your files without giving them the password. Of course all this makes it very hard to remember your password if you don't use the computer very often.... Let us assume that you have successfully logged on. Your will now be located in your home directory. UNIX uses a directory tree structure, similar to the "folders" used by PCs but without all the fancy icons. Normally you will have a cursor prompt that tells you what machine you are running on. In my case, this looks like: shearer@katmai 5> because my laptop (my primary computer) is called "katmai." Your default prompt is likely to be different from this. If you want to have one like mine, enter set prompt="%n@%m %h> " where n refers to name, m to machine name, and h to line number. You will find it convenient to put this in your .cshrc file (see below). The notes that follow were written when I used rock (now a Sun in the Barnyard) and so have rock% as the standard prompt. BASIC COMMANDS To find out where you are, enter "pwd" which stands for "Print Working Directory." In my case, this will give: rock% pwd /net/rock/shearer This shows that I am located in my home directory (named shearer) on rock. You can list the contents of your current directory with the "ls" command: rock% ls (I'm not showing you the output because it's too messy in my case!) UNIX has an online set of manuals that may be accessed with the "man" command. For example, suppose you forgot what "pwd" does. You could type "man pwd" and you would get a description of the pwd command. One annoying aspect of standard UNIX is that the man command output uses a form of the VI editor rather than the normal window output, which permits scrolling up and down in the window. When you enter "man pwd" you will get a page of output on the screen with a colon (":") at the bottom. If you enter the space bar, you will then get the next page of output, etc. But you can't scroll backwards using the window scroll arrows. To go backwards, enter "b" at the colon prompt. However, if you go past the final page, you will be dumped out of the man pages completely. If you find this too annoying to deal with, you can always save the man pages to another file. To do this, enter: man pwd > man.pwd The output will now be directed into a file named "man.pwd" rather than to the screen. (man.pwd is not exactly what would have been printed to the screen, mainly because the control characters that permit underlining are printed out explicitly. These characters appear as ^H in the file but are really only a single character (try moving the cursor over them). If they bother you, a "cleaner" file can be obtained by entering, e.g., man pwd | sed -e 's/^H//g' > man.pwd where the ^H character is obtained by hitting the control key and then backspace. We will give more examples of the sed editor later. You can then use your regular text editor (more about this later) to look at man.pwd or simply enter "cat pwd" which will list the entire file in your window at once, where you can then use the scroll bars to look at it. Of course, you risk cluttering up your directory with lots of files named "man.whatever" if you do this a lot. If you don't want to worry about deleting them, one solution is to use the name "junk" as a misc. place to store output. If you already have a file named junk, it will just overwrite the old file. This way, you only ever have one "scratch" file in your directory at one time. There usually are options to UNIX commands that can make them more useful for what you want. For example, "ls" simply lists the file names in your directory. If you want to find out how big they are or when they were last modified, then use "ls -l" where the "l" stands for the long output option. In your home directory, you will have a very important file called ".cshrc" (see below). This file will not appear if you just enter "ls". To make it appear, enter "ls -a" where the "a" stands for the "all file" option. To change your directory, use: cd dirname This will move you to the directory dirname. This directory name must be in your current directory. OR you can give the full path name for dirname, i.e. cd /net/rock/shearer will get me to my home directory no matter where I am on the system. Alternatively, one can go to one's home directory by entering cd You can also go directly to subdirectories by entering: cd ~/dir1/dir2 This will get you to directory dir2, located in dir1 in your home directory. Naturally this does the same thing as cd cd dir1 cd dir2 You can go back up one level by entering cd .. You can go back up and then down again into a different directory by entering: cd ../dir2 To create a new directory, use the "make directory" command: mkdir dir1 It is often convenient to use a different naming convention for your directories in order to distinguish them from your files. Some people put .dir after their directories. For awhile I put d. in front of the directory names, which has the advantage of grouping them together when "ls" is entered, since UNIX lists things in alphabetical order. Most recently, I have been using all capital letters for the directory names. This makes them more visible, but has the disadvantage of slowing down typing their names (yes, UNIX is case sensitive!). If you don't want to use special names for directories or if you find yourself in somebody else's directory where they don't do this, you can use the "ls -F" command: ls -F This will add a "/" suffix to directory names and a "*" prefix to excutable file names. I like this so much that I have set this up as my default "ls" command by putting an alias into my .cshrc file (more about this later). FILES AND EDITING Files can be simple ascii (text) files or they can be binary files, often the excutable versions of computer programs. One standard UNIX editor is called vi and is still used by some of the old time programmers at IGPP who will insist that it is remains the best editor. If you know all of its tricks it can be an extremely powerful editor. You can do things, like cut and paste columns of numbers, that most editors can't do. It also has the advantage of running on any terminal so if you log in from home you can still edit files. If you know vi or decide that you want to learn it--more power to you. You will not lose any points around here. However, vi is not mouse or window friendly and is not favored by most students today. I have forgotten most of the vi that I once knew (I used it to do my thesis back in the late Cretaceous). Now, on the Suns I use the Sun editor called textedit. This is run by entering: textedit filename It is pretty self explanatory. The cut and paste buttons on the keyboard work with this. It brings the file up in a separate window and you can use the mouse and the scroll arrows. (In class, you may see me simply type 'edit' to invoke this editor. I can do this because I have created an alias in my .cshrc file so that typing 'edit' is the same as typing 'textedit' -- we will discuss how to create aliases a bit later when we describe .cshrc files) Editors on the Mac include: nedit -- This is supposed to run on all platforms and has more features than textedit. emacs -- this is a very powerful editor that is used by computing professionals. It can do just about anything but may be somewhat harder to learn at first than simpler editors. Xcode -- special editor designed for editing programming code You can use any of these editors to create a new file or edit an existing file. Unix file names are case sensitive. By convention, the type of file is often indicated by ".type", for example: testprog.f for Fortran77 program source code testprog.f90 for Fortran90 source code testprog.c for C program source code testprog.m for a MatLab script testprog.o for an object file figure1.ps for a Postscript file figure1.gif for a GIF file You may wish to create your own naming convention to keep track of your files. When used with wildcards (see below) this will make it easy to list all files of a particular type. WARNING: Do not use the dash character ("-") in file names; this may cause all kinds of problems for you. Use . or _ to separate the words. NEVER USE BLANKS IN FILE OR DIRECTORY NAMES!!! (I know this is common on Macs and PCs but you will eventually have big problems reading these files with your programs) If you want to remove a file use the "rm" command: rm filename If you want to change the name of a file, use the "mv" (move) command: mv filename1 filename2 If filename2 already exists, this will have the possibly undesired consequent of deleting the original filename2. To guard against this, use the "-i" option: mv -i filename1 filename2 Now if filename2 already exists, the computer will ask you first if you want to overwrite this file. mv can also be used to move files between directories: mv filename dirname will move filename into directory dirname (assuming dirname already exists!) where it will have the same name. Note that this does the same thing as mv filename dirname/filename For convenience, you can leave off the /filename if the file is to keep the same name. A note of caution: In the short version, if dirname does not exist as a directory, then the name of filename will be changed to dirname. The copy command works in a similar way: cp filename1 filename2 makes a copy of filename1 called filename2. cp -i filename1 filename2 will first ask if you really want to do this if filename2 already exists. You can copy files to different directories in the same way as the mv command works. Many people so prefer the "-i" option for mv and cp that they make it the default option by defining an alias in their ".cshrc" file (see below) so that "mv" and "cp" become "mv -i" and "cp -i". I recommend that you do this--it is likely to save you some grief in the future. To remove a directory, use the "rmdir" command: rmdir dirname The directory must first be empty for this to work. To recursively remove a directory and its contents use: rm -r dirname Use this with extreme caution to avoid accidentally deleting more than you intended! WILDCARDS UNIX commands become much more powerful when they are used with the wildcard character "*" which can take on any ascii string. For example, suppose you wanted to list all files ending with ".f" the suffix used to identify Fortran source code. Simply enter: ls *.f You could move all of these programs in a subdirectory called source.dir by entering: mv *.f source.dir You might have a bunch of plot files called mypost1, mypost2, etc. You could delete all of these at once by entering: rm mypost* For obvious reasons, be very careful when using "*" with the rm command. For example, suppose you wanted to delete all files in your current directory that end in "%" which the texteditor uses to store the original version of edited files. To do this, enter: rm *% Suppose, however, that you are careless when you type this and enter instead: rm * % This will delete everything in your current directory! So always look carefully at what you have typed before hitting the return key when you are deleting files using wildcards. THE .login and .cshrc FILES In your home directory, you can have a files called .login and .cshrc that are executed whenever you login. These files are used to define and customize your environment. You may have default .login and .cshrc files set up by the NetOps people when you first logon, but you can modify them to do what you want. For example, some people like to see a Fortune Cookie message upon login. To get this, put /usr/games/fortune in your .login file. (***This does not work with Macs right now as the /usr/games directory does not exist***) A more useful thing to do is to put aliases in your .cshrc file for the mv and cp commands so that you don't accidentally overwrite files: alias cp 'cp -i' alias mv 'mv -i' I like to put aliases in for the printers that I commonly use: alias pf 'lpr -P silo' alias pcol 'lpr -P klee' In this way I can enter: pf bwfile to send bwfile to printer "silo" (black and white printer) and pcol colfile to send colfile to printer "klee" One of the most important things in the .cshrc file is the "set path" command. This lists all the directories that the computer should look in when you try to run a program. For example, if you type "matlab" the computer needs to know where to look to find the matlab program. If you get a "Command not found" message, then you don't have the right directories listed in your .cshrc file under "set path" Sometimes the NetOps people start you with a .cshrc file that does not list your current directory under the "set path" command. In this case, you will not even be able to run a program that is sitting right in your current directory. How humiliating! Don't feel inadequate, just make sure you have a '.' listed in your "set path" command. For example, here are my set path commands: set path=($path .) set path=($path /usr/X11R6/bin ) set path=($path /sw/bin ) set path=($path /sw/sbin ) set path=($path /sw/igpp/bin ) set path=($path /Applications ) set path=($path /Applications/Utilities ) set path=($path /Applications/MATLAB6p5p1/bin) set path=($path /Developer/Applications ) Notice that there are scads of places to check for executables. The most important one, however, is the . at the beginning which stands for the current directory. On the Suns, there were lots of things in my .cshrc file that I didn't really understand but I figured they must be doing something. You can always look at other people's .cshrc files if you have trouble with yours. If you make any changes to your .cshrc file (or your .login file), they won't take place until your next login. Alternatively you can enter: source .cshrc to make the changes immediately. But you will have to do this separately for each window that you have open. SCRIPTS One of the most powerful ways to use UNIX is to write scripts to run your programs. This is an easy way to keep track of the input and output parameters and to make changes without having to enter everything in again. For example, suppose you have a program called mapquake that asks you a bunch of questions like this: rock% mapquake Enter input file name quakelist1 Enter min,max quake magnitude 4.0 9.9 Enter maximum standard error (km): 10 Enter min,max latitude 30 50 Enter min,max longitude 10 40 Enter symbol scaling 0.7 Enter grayshade for continents 0.5 Enter output file name mypost1 rock% After running this program, you decide that you would like to change the grayscale value to 0.4. This is tedious if you have to type in everything again. Instead you can write a script called "do.mapquake" that looks like this: mapquake << MAPQUAKE quakelist1 4.0 9.9 !min,max quake magnitude 10 !maximum standard error (km) 30 50 !min,max latitude 10 40 !min,max longitude 0.7 !symbol scaling 0.5 !grayshade for continents mypost1 MAPQUAKE This is just an ascii file that you write with your favorite text editor. Often people will use "!" instead of "MAPQUAKE" in scripts like this; they seem to work in the same way. (Can anyone tell me if one convention has any advantages over the other?) The comments following the numbered input (e.g., !min,max quake magnitude) are convenient if the program that you are running is robust enough to ignore additional characters on line that follow the numbers that are actually input into the program. To run the script, simply enter: do.mapquake You may get a message which says: do.mapquake: Permission denied This means that you don't have execute permission on this file. To see what the permissions are, use the "ls -l" command: rock% ls -l do.mapquake -rw-r--r-- 1 shearer 39 Jul 31 10:26 do.mapquake The "-rw-r--r--" shows what the permissions are for the file. Columns 2-4 ("rw-") are your permissions as owner of the file. Columns 5-7 and 8-10 give the permissions for your group and all others, respectively. The "r" means read permission, the "w" means write permission and "x" means execute permission. In this case our problem is that we have "rw-" instead of "rwx" To fix this, use the "chmod" command: chmod 0700 do.mapquake This will give you "rwx" permission of file do.mapquake. For more details see the chmod manual. The do.mapquake script will run the mapquake program and enter all of the required inputs. Notice that in this case I have added helpful comments to the numerical input lines. You can do this with most programs and it will not affect the input (at least for FORTRAN, I'm not so sure about C). You usually cannot, however, add comments to the character inputs (e.g., "quakelist1" in this example) because it will consider them part of the name. Now it's easy to keep track of the inputs and to make changes without having to re-enter everything. You can also put scripts together to run the program many times: mapquake << MAPQUAKE quakelist1 4.0 9.9 !min,max quake magnitude 10 !maximum standard error (km) 30 50 !min,max latitude 10 40 !min,max longitude 0.7 !symbol scaling 0.5 !grayshade for continents mypost1 MAPQUAKE mapquake << MAPQUAKE quakelist2 4.0 9.9 !min,max quake magnitude 10 !maximum standard error (km) 30 50 !min,max latitude 10 40 !min,max longitude 0.7 !symbol scaling 0.5 !grayshade for continents mypost2 MAPQUAKE mapquake << MAPQUAKE quakelist3 4.0 9.9 !min,max quake magnitude 10 !maximum standard error (km) 30 50 !min,max latitude 10 40 !min,max longitude 0.7 !symbol scaling 0.5 !grayshade for continents mypost3 MAPQUAKE In this case, you can make three different plots from three different input files. USING ANONYMOUS FTP / TRANSFERRING FILES BETWEEN MACHINES You often will want to get files from another computer. One way of doing this is the ftp command. For many public sites and data centers this is done with anonymous ftp. You simply enter: ftp othercomputer where "othercomputer" is the name (or IP address) of the other computer. When you are asked to login, just enter "anonymous" and then your e-mail address as a password. Of course, if you have login permission on the other computer then you can ftp to the machine even if it is not set up for anonymous ftp. Once within ftp, you will get a ftp> prompt. You then can use the "cd" command to get to the directory that you want and "ls" to see the file names on the remote computer. Finally use "get" to bring the desired file to your own computer and then "quit" to exit ftp. If you want to get all of the files in the directory, use the command "mget *" and you will be prompted for each file name. If you don't want to be prompted, you can turn off the interactive mode by entering ftp -i othercomputer when you first invoke ftp. If you are getting binary files (rather than simple text files), you should enter type binary at the FTP prompt before getting the files. Because "type binary" will work with all types of files, including ascii, it is a good practice to always enter "type binary" as soon as you enter ftp. The default for the Suns is "type ascii" which does not work with binary files. If the remote computer objects to regular ftp for security reasons, you might try the "sftp" which is the secure ftp command. If you have just a few files to transfer, you may want to use the secure copy command: scp filename *.ps shearer@rock.ucsd.edu:./TRANSFER will copy filename and all files ending with .ps from the current directory to the TRANSFER directory in shearer's account on rock.ucsd.edu (after first prompting for the password). If you want to copy a directory, you need to use the -r (recursive) option, i.e., scp -r dirname shearer@rock.ucsd.edu:./TRANSFER Note that one can also copy files from the remote computer to the current directory, i.e. scp -r shearer@rock.ucsd.edu:/net/moray2/scratch/dirname dirname_local In this case the remote directory was on the /net/moray2 disk; notice how we specified the complete path name. FILE COMPRESSION Often the files that you retrieve will be compressed. Files are compressed using the UNIX "compress" command: compress filename This will change the name to filename.Z which tells you that it is compressed. This is useful to save disk space when you will not be using the file for awhile. To get back to the original file, use uncompress: uncompress filename You can compress a whole bunch of files by using wildcards: compress *.ps will compress all of your Postscript files, assuming you use the .ps suffix convention for these files. These can be uncompressed with "uncompress *.ps" as you would expect. An alternative compression method (not standard UNIX but usually available) is invoked with the gzip command: gzip filename This changes the name to filename.gz with the reverse operation: gunzip filename The gunzip command will also decompress .Z files (but the uncompress command will not decompress .gz files). You may find it useful to use compression yourself by compressing files that you do not use very often in order to save space. I often do this when I want to get some disk space but am too nervous to delete the files and too lazy to write them to a backup tape. USING THE TAR COMMAND Often you may want to save or retrieve an entire directory of files. This is most easily done using the "tar" command. If you are within the directory containing the files that you wish to save, then enter: tar -cvf ../archive.tar . The arguments are as follows: -c create tar archive -v verify by printing file names to screen -f output file name will follow ../archive.tar name for tar file (../ to put in next level up) . tar every file in current directory Alternatively you can save the entire directory and its contents from the level above the directory: tar -cvf programs.tar programs.dir The tar file can then be FTPed to another machine. For even more efficiency you may wish to compress the tar file first. The files can be retrieved as follows: tar -xvf archive.tar This will put all of the files in the archive into the current directory. Finally, tar is also commonly used to read/write data to tape, in which case the file name (e.g, ../archive, programs.tar, etc.) in the above examples is replaced with the name for the tape device. There are both Exabyte and DAT tape drives in the barnyard; they have names like "/dev/nrst12" which should be written on top of the units. NOTE: You must be logged on to the machine that the tape drive is connected to (see ssh command below). RUNNING ON A DIFFERENT MACHINE AND KEEPING TRACK OF JOBS Often you will want to run on a different machine than the one that you are sitting at. The other machine may be faster, have more memory, be connected to a tape drive, etc. To do this enter: rlogin machinename (discouraged by NetOps) or ssh machinename (encouraged by NetOps) and you can login and run remotely on this machine. Of course this will slow the machine down for anyone else using it so use some courtesy in doing this. One way to do this is to start your job with the "nice" command: nice do.bigjob where do.bigjob is the script that runs your program. The "nice" command lowers the priority of your job so that it will not interfere with others using the machine. You still may be unpopular, however, it you use a lot of memory on the machine. Niceness levels range from 1 (highest) to 19 (lowest). The default for the nice command is niceness 4 or 10 (depending upon which UNIX shell you are running). To set the niceness to a specific value, you can specify a number, e.g., nice +15 do.bigjob will run do.bigjob at niceness 15. As a word of caution, most people don't like to have other people run jobs on their "personal" machines (the ones in their offices) without permission, even if the jobs are set to run at large niceness values. To find out what jobs are running on your machine, you can use the "top" command to list the most active jobs: top This will take over your window and update the results continuously until you enter q for quit. The Process ID (PID) is listed, together will the username, the niceness, the faction of the CPU being used and other useful information. top is interactive and you can input various commands (? for help, u to see only one user. etc.). You can also kill jobs from within the top program. If you did not originally use nice when you started a job you can "renice" the job from within top by entering "r" Alternatively, if you know the job number you can change the niceness at the command level, e.g., rock% renice +15 1132 where 1132 is the PID number (get from top program). You can renice more than once, but only to raise the niceness level-- you can never lower the niceless level once it is set. MISCELLANEOUS Unix keeps track of your previous commands. To see them, enter "h" (for history) and it will list your last 30 commands. To repeat a past command, enter ! followed by line number in the "h" list or the first few letters of the command. To repeat the very last command, enter "!!" Want to see what the beginning or end of a file looks like? Use the "head" or "tail" command: head filename ---lists the first 10 lines of the file tail filename ---lists the last 10 lines of the file To see a file one page at a time, use the "more" command" more filename and then hit the space bar to advance one page at a time. Lose track of where a file is? You can use the "find" command: find . -name filename -print This will look in the current directory (that's what the "." is for) and below for the file named "filename" and then print where it is. What if you only know some of the characters of the file? You can use a wildcard: find . -name 'map*' -print This will find all files that begin with "map" and print them on your screen. Note that you MUST enclose 'map*' in apostrophes for this to work. Unix has many powerful utility programs. One of these is the "sort" command to sort files in alphabetical or numerical order. Example: sort +4 -n -b -r file1 -o file2 This sorts file1 and outputs the results to file2. The following options are used in this example: +4 skip first 4 fields (leave out to use beginning of line) -n numerical order (default is alphabetical order) -b ignore leading blanks -r reverse order (leave out for standard order) To find out how much disk space is available use df (disk free): df This will not list all the disks on the system unless they have been mounted. Just go to the disk that you are interested in and retry "df" if it does not appear the first time. To see how much space you are using, the best command is du -ks * This will list the disk usage of each of your subdirectories. It is a good idea to go through your directories once a month or so to delete unnecessary files and/or compress large files that you don't use very often. To check for misspelled words in a file, use: spell filename This will list all words not found in a dictionary (which one?). Use the "-b" option for the British spelling if you are submitting a paper to Nature. (***NOTE: spell does not seem to work on the Macs!) To reformat a text file to a uniform maximum line length of 72 characters, use: fmt file1 > file2 This assumes that blank lines separate paragraphs. Line feeds within a paragraph are removed and added as necessary to make the lines of approximately equal length. This command is a useful feature if you use regular unix mail and create your outgoing messages with a text editor. It is helpful when you want to change a line in the middle of a paragraph because you don't have to redo all the following lines in order to make things look nice. To preview a postscript file on the screen use: pageview filename.ps or ghostview filename.ps I usually use pageview first as it seems faster, but ghostview is more reliable for all types of Postscript files. (***Note: use ghostview or gv on the Macs, not pageview!!) Finally, this should go without saying, but.... Do not download pornography, hate literature, etc., on UCSD computers. You will get into BIG trouble (just ask a certain famous ex-Yale professor). You should not consider e-mail to be completely private. Even deleted e-mail is usually still on the computer system somewhere, often on daily and weekly backup tapes. Be very careful when you reply to messages sent to a group of people that you do not send your message to everyone on the list (unless that is your intention). ADVANCED UNIX This writeup contains pretty much all I've ever done with UNIX. You can go pretty far with only 20 commands or so. However there are much more powerful things you can do. If you really want to be a UNIX guru, try reading up on pipes, grep, awk, sed, etc. Then you will be able to write custom scripts to do all kinds of neat things. Here are some examples: grep dziewonski filename This will print every line from file "filename" that contains the string "Dziewonski". grep -v dziewonski filename This will print every line from file "filename" that does NOT contain the string "Dziewonski". grep dziewonski filename > dz.lines This works as above except the lines containing dziewonski are written to the file dz.lines rather than printed to the screen. grep dziewonski * This lists lines containing dziewonski from ALL files within your current directory. However, you may really want just the file names of the files that contain the string dziewonski, in which case the following command will work better: grep -l dziewonski * This lists the file names of the files that contain the string "dziewonski" grep ezxplot `find . -name Makefile -print` This is one that recently saved me when I could not find a program that I knew I had written, but I did not know its name or which directory it was in. I knew, however, that the program was compiled with a Makefile that would contain "ezxplot" because this is necessary to make the program work. The grep command searches for ezxplot in a list of file names returned by the find command. Note that the find command must be enclosed with backward apostrophes, not regular apostrophes. ps -eaf This lists all processes that are running on your machine ps -eaf | grep shearer This lists all of the processes that contain the string "shearer" in the output lines of ps. In this case, the "|" is a "pipe" that directs the output of the first command, ps, into the input of the second command, grep. ps -eaf | grep shearer > junk As above, but writes the output into file "junk" rather than to the screen. kill PID This kills a job with process ID number PID (obtained from the tops or ps command). This is useful for runaway jobs. For stubborn jobs, use the -9 option: kill -9 PID diff file1 file2 This lists all differences between file1 and file2. This is useful if you have made some changes to a file but cannot remember exactly what they are. SOME SED AND AWK EXAMPLES cat file1 | sed 's/Peter/Paul/' > file2 Copy file1 to file2, substituting "Paul" for the first "Peter" on each line. cat file1 | sed 's/Peter/Paul/g' > file2 Copy file1 to file2, substituting "Paul" for the every "Peter" on each line (note the "g" flag for global substitution). cat file1 | sed 's/^/Paul says /' > file2 Insert the prefix "Paul says " at the beginning of each line. Note that "^" means start of line cat file1 | awk '{print $5,$3,$1}' > file2 Assuming file1 has 5 columns of figures, this copies columns 5, 3, and 1 to file2, in that order, omitting the 2nd and 4th columns. cat file1 | awk '{print $2, $1*(-10)}' > file2 Switch columns 1 and 2 and multiply original 1st column numbers by -10. cat file1 | awk '{print substr($0,1,10) substr($0,31,10) substr($0,21,10) substr($0,11,10) substr($0,41,length($0)-40)}' > file2 This copies file1 to file2, swapping the contents of columns 11-20 and columns 31-40. $0 is the line, substr takes a chunk of it, the last part prints the remainder of the line. cat file1 | awk '{print "Columns 11 to 20 are " substr($0,11,20)}' > file2 Start the beginning of each line with "Columns 11 to 20 are " and then list the contents of those columns in the input file. Variations on the last few examples can be used to reformat ascii data files, including those that do not have spaces between fields. AN EXAMPLE OF A UNIX SCRIPT TO PROCESS DATA By combining UNIX commands into a script, it is possible to create very powerful tools for processing data. Consider a simple example where we have a number of data files contained in a data directory: rock% cd data.dir rock% ls data1 data2 data3 We have written a program to process the data in these files and write new data files which we might want to call data1.proc, etc. The program is called procdata and prompts the user for an input file name and an output file name, and, in this simple example, a multiplier factor to scale the data. If the program is one level up from the data directory, then: rock% ../procdata Enter input file name data1 Enter output file name data1.proc Enter multiplier factor 3 rock% To process all of the files in the directory, we could run the program for each file, manually entering the file names. However, clearly this would get very tedious if we had lots of files to process. We could modify the program to accept a list of file names, but perhaps it is a complicated program that someone else wrote that we don't want to mess with. Another approach is to write a UNIX script to run the program for all of the files in the directory. Here is one way to do this, using the command file "do.proc" which looks like this: ------------------------------------------------------------------ #! /bin/csh \rm procdata.log \rm data.dir/*.proc ls data.dir > filelist cd data.dir # Note the "backwards" apostrophes in next line, regular ones won't work! foreach filename (`cat ../filelist`) echo "processing file:" $filename ../procdata >>! ../procdata.log << ! $filename $filename.proc 2 ! end cd .. \rm filelist --------------------------------------------------------------------- This is designed to be located in the same directory as the program, one level up from the data directory. The screen output from the program (all of the "Enter input file name", etc., lines) are directed to a file called procdata.log so the first thing we do is remove any old version of this file, if it exists. Note that within the script, we use "\rm" instead of "rm" so that any aliases that might require interactive verification of the deletions are not performed. Otherwise the computer might prompt us to see if we really want to delete the files and the script would not be prepared to handle this. Next, we remove any existing processed files in the data.dir directory, using a wildcard and assuming that the file names end in ".proc" Next, we write a list of the data file names within data.dir into the file "filelist" Then we go into data.dir where we loop over the filenames contained in filelist, using the "foreach" command. This loop is terminated by the "end" command later in the script. The `cat ../filelist` (be sure to use backward apostrophes!) will return one line of filelist at a time and assign it to the filename variable. The backward apostrophes indicate that a UNIX command is to be executed. We use the "echo" command to output to the screen each file that is being processed. We then run the procdata program (one level up so we need the ../) and direct the normal screen output to the logfile. We use >>! rather than >> in case "set noclobber" is contained in our .cshrc file (set noclobber prevents overwriting an existing file with > or writing to a nonexistent file with >>, the latter case being our situation. Note that >! also overrides the noclobber setting). Within the "foreach" loop we refer to the contents of the filename variable as $filename. The procdata program is terminated within the script with the ! symbol. Following the "end" statement that completes the loop over the files, we go back to the directory containing the program and delete filelist, as it is no longer needed. The power of this script is that it can be run on a directory containing thousands of files, just as easily as for a smaller number of files. In this example, we really did not need to generate the file filelist because we eventually deleted it. Thus, we could have written the script as: ----------------------------------------------------------------------- #! /bin/csh \rm procdata.log cd data.dir \rm *.proc # Note the "backwards" apostrophes in next line, regular ones won't work! foreach filename (`ls`) echo "processing file:" $filename ../procdata >>! ../procdata.log << ! $filename $filename.proc 2 ! end cd .. ------------------------------------------------------------------------ Alternatively, the `ls` could be written as *, i.e., foreach filename (*) will also work. In this case the wildcard * will assume the name of all of the files within the current directory. We won't have time in this class to go into the details of all of the different things one can do in scripts like this. There are lots of books on UNIX that one can consult for this purpose (but who has time to read them?), but most of us just pick up stuff as we need it. The main point that I want to get across is that if you are spending lots of time running programs manually, then you are wasting your time. Spend some of that time learning how to write a UNIX script and you will be far better off in the long run. Your work will be better documented and it will be much easier for you (and others) to reproduce your work. COMMON UNIX COMMAND SUMMARY cat filename print filename on your screen cd dirname change directory to dirname cd .. go back up one level cd go to home directory cd ~/dirname go to directory dirname in home directory cd ~otheruser go to home directory of otheruser cp file1 file2 copy file1 to file2 cp -i f1 f2 copy f1 to f2 but ask before overwriting f2 df list disk space on the different disks du -ks * list disk usage for files/dirs in current directory lpr -P silo filename send filename to printer silo ls list files in directory ls -l list ls -a list all files including those starting with . ls -F flag directories by adding slash to their name ls *.f list all files with ending with ".f" mkdir dirname make directory dirname mv file1 file2 change name of file1 to file2 mv -i f1 f2 change name of f1 to f2 but ask before overwriting existing f2 mv *.f src move all files ending in '.f' into existing directory src pwd print working directory rm filename remove filename rmdir dirname remove directory dirname (directory must be empty) wc filename count words in file wc -l filename count lines in file