Introduction to UNIX (and Linux)

UNIX is a generic term for a whole series of similar but distinct computer operating systems (OSes) that orginated from Ken Thompson's and Dennis Ritchie's early work on OSes at Bell Labs in the late 60's. (Click here for an interesting and much more complete history of UNIX). Some of the features of UNIX that have made it popular are: multitasking - the ability to do many different things at once; multiuser - more than one person can use the computer at the same time, or at different times; portability - UNIX and linux run on almost every computer architechture ever invented; and for every UNIX flavor there is a large built-in suite of powerful, free programs. There have been many UNIXes designed over the years for a whole host of (expensive) computer hardware architechtures that you have probably never heard of. Reletively recently, however, operating systems based more or less on UNIX have begun to permeate ordinary people's lives in the form of linux and Mac OS X. (Since, for the purposes of this class, UNIX and linux are indistinguishable, I will use them interchangably.) The machines in ISB 107 all run OS X, which is actually UNIX at its core; this will become apparent when we start using the program, which is located in Applications>Utilities.

Linux was started as a hobby of a guy called Linus Torvalds in 1991 as a way to use a free UNIX-like OS on computers that regular people used, like the PCs that your parents would use. It took off like a rocket and today is wildly popular. But linux itself isn't a complete OS. The reason that linux was successful was becasue of Richard Stallman's GNU (which stands for GNU's Not Unix) project. The GNU project is a *free* UNIX-like OS and software suite that is maintained by a large number of programmers located all over the world. We will use MAc OS X for this course, but the UNIX skills you learn here will be applicable to any other UNIX-like environment, including all linux distributions which use GNU. This is important for bioinformatics, since much of bioinformatics happens on linux machines.

What you can do well with UNIX/Linux:

What you can't do well with UNIX: So the take home message here is that you can do a lot fast with a little practice with UNIX. It helps to know a a few commands, regular expressions, and a programming language like Perl or python. We'll get to that.


The organization of the UNIX file system is very simple. It's basically a whole bunch of directories (usually called folders in the Windows and Mac world) organized into a tree structure. Here is a toy UNIX file system that I will use as an example:

You'll notice right away that at the top of the tree is a directory labeled / (which is called the root directory). It's the very top of the UNIX file structure, and so it contains the all of the directories, and everything else, in the file system. The next level down from the root directory contains many directories that have important system programs in them and (at least in this example) a directory that contains all of the users of the computer. For us, the imporant level of directories is the next one down, the user directories themselves (sudhir and john in the example). I only mention the non-user directories for completeness. You do not need, and it would be wise to stay out of, any of the directories "above" your user directory. It's not likely, but it's possible that you could accidentally mess things up. Bad. For everyone.

When you log in, you will be in the equivalent directory of john. (john is my login name, so my home directory is called john.) This is where the action will happen. In this directory I can make new directories to organize my stuff, move around, make new programs, run programs, edit files, and so on. Your home directory is the first "working directory" you will encounter. The concept of the working directory is simple but critical. Your current working directory is your location in the file structure at a given time. Or, in other words, wherever you currently are is your current working directory.

Getting started

First, open a new terminal, if one isn't open already (ask if you don't know how to do this). You will see what's called a prompt waiting for you to issue a command, e.g.:

[john@ccbtl11 john]$

A "terminal" is the means by which you talk to the computer. The "prompt" or the "shell prompt" interperates what you type into language the computer understands and then returns the results back to you. (See the Software Carpentry lectures on the UNIX shell for more information.) You can open multiple terminals at once to make it easier to work in different directories and on different files in the same directory simultaneously. All new terminals will all open up in your home directory.

Making directories and moving around

Issue the ls command to 'list' the contents of your home directory.

[john@ccbtl11 john]$ ls
[john@ccbtl11 john]$

Nothing is listed, becasue nothing is there. To make a new directory, use the command mkdir.

So, for example, to make the directory taing, you would issue the command

[john@ccbtl11 john]$ mkdir taing
[john@ccbtl11 john]$

Now, do another ls, and the directory taing should be there

[john@ccbtl11 john]$ ls
[john@ccbtl11 john]$

Now I'll make the rest of the directories in the example.

[john@ccbtl11 john]$ mkdir crap rna mp3
[john@ccbtl11 john]$ ls
crap mp3 rna taing
[john@ccbtl11 john]$

You need to be able to move around into your newly-created directories. This is done with the 'change directory' command, cd. So, to change into the taing directory,

[john@ccbtl11 john]$ cd taing
[john@ccbtl11 taing]$

To see where you are in the directory tree, use the 'print working directory' command, pwd

[john@ccbtl11 taing]$ pwd
[john@ccbtl11 taing]$

Now say you want to go back up to the john directory. To do this, you need to know how relative directory positions are described in UNIX.

The current directory is always referred to as . (That's right, just a period.)

The directory above you is always called .. (That's two dots.)

So, in our example, to go back up to the john directory,

[john@ccbtl11 taing]$ cd ..
[john@ccbtl11 john]$ pwd
[john@ccbtl11 john]$ ls
crap mp3 rna taing
[john@ccbtl11 john]$

Now, let's make a couple more directories, one in which to store some perl programs and one to store our homework.

[john@ccbtl11 john]$ mkdir taing/perl taing/hw
[john@ccbtl11 john]$ ls taing
hw perl
[john@ccbtl11 john]$

Now you can see two directories listed in taing, hw and perl. Let's move around a bit more, to get more comfortable with it. I've numbered the line here for ease of discussion.

[01] [john@ccbtl11 john]$ cd taing/perl
[02] [john@ccbtl11 perl]$ pwd
[03] /home/john/taing/perl
[04] [john@ccbtl11 perl]$ cd ../hw
[05] [john@ccbtl11 hw]$ pwd
[06] /home/john/taing/hw
[07] [john@ccbtl11 hw]$ ls /
[08] bin home local nfs src usr
[09] [john@ccbtl11 hw]$ ls ../../../../
[10] bin home local nfs src usr
[11] [john@ccbtl11 hw]$ ls /home/john/
[12] crap mp3 rna taing
[13] [john@ccbtl11 hw]$ pwd
[14] /home/john/taing/hw
[15] [john@ccbtl11 hw]$ cd
[16] [john@ccbtl11 john]$ pwd
[17] /home/john
[18] [john@ccbtl11 john]$ cd taing/perl/
[19] [john@ccbtl11 perl]$ ls ~
[20] crap mp3 rna taing
[21] [john@ccbtl11 perl]$ cd ~
[22] [john@ccbtl11 john]$ pwd
[23] /home/john

In line, [04], I moved directly from the perl directory into the hw directory. In lines [07] and [09], I listed the entire contents of the root directory. In line [11], I listed the contents of my home directory by giving the ls command what's called the "complete path" of my home directory. In line [15] I just typed cd. This always brings you back to your home directory. In line [19] I listed to contents of my home directory from the taing/perl directory by using the tilde "~". Tilde always means home directory. Line [21] does the same thing as line [15].

From these examples, you can see that you are able to string together any number of directories for moving around or listing the contents of directories from any other location in the file structure. You just need to know where you are going relative to where you are, or know the complete path of where you are going. Play around with it.

Making, editing, removing, and viewing the contents of files

OK, now you can make new directories to organize all your files. But you need to know how to make the files, right?

One of the cool and flexible and portable things about UNIX is that all files that you work on are just plain text files. Have you ever written a paper in Word for Windows and tried to open it in Word for Mac? Sometimes it works, sometimes it doesn't. It always works in UNIX, since the all of the files are in plain text, and always will be. You can make fancier things with other programs in UNIX, but most of the real work can easily be done in plain text. You will write your homework in this class in plain text (at least if you want us to grade it!), program in python using plain text, processing Illumina data files that are in plain text, and so on. And you will do this all in one program that edits plain text files. It's called Text Wrangler. There are a lot of good text editors out there, but we will use Text Wrangler becasue it's simple, attractive, works well, and is used in your text book. (If any of you have another text editor you love to program in, you are free to use it. Then again, if already love to program, why are you in this class? If any of you have Windows machines that you'd like to program in, you'll need to use a different editor, such as Notepad++.)

Life is simple with plain text

To make a new, empty text file in Text Wrangler, just open it by double clicking on the icon in Applications. Create a new file selecting File>New>Text Document. A new file in the list of files in the left tab will open called "untitled text". Now type anything. Your dog's name, your favorite color, or your favorite muppet. Now save it to your home directory, calling the file "newfile.txt". Now do an ls command from your home directory, and you should see the new file:

[john@ccbtl11 john]$ ls
[john@ccbtl11 john]$

A quick note on file naming conventions

You will notice that in the above example, I named the file newfile.txt, not just newfile. There is a very good reason for this. The reason is that the file I created is just a text file (hence the .txt extension). Proper usage of file extentions are critical to sucessful organization in the UNIX environment and in bioinformatics circles. They are not forced on you - you can name files anything you want (well, almost anything. See the last subsection of this section) - but if you do that I guarentee that you will quickly get confused. Your file extensions should mean something.

Here are some examples:

and so on ... enforcing this will save you many headaches when things start to get complicated.

Moving files

Let's download a file and look at it in Text Wrangler. Get this file. Be sure to note which directory the file gets saved into from the Netscape download. Make a new directory below your home directory called yeast. Now, move your new file into the yeast directory. This is done with the move command mv. E.g.:

[john@ccbtl11 john]$ ls
chr04.fsa crap/ mp3/ newfile.txt rna/ taing/
[john@ccbtl11 john]$ mkdir yeast
[john@ccbtl11 john]$ mv chr04.fsa yeast
[john@ccbtl11 john]$ ls
crap/ mp3/ newfile.txt rna/ taing/ yeast/
[john@ccbtl11 john]$ ls yeast/
[john@ccbtl11 john]$

There is a copy command, too: cp. It's syntax is the same as mv, e.g.:

cp [file to move/copy] [directory to move/copy file to]

Now, open the chr04.fsa file in Text Wranger and look at the sequence for Saccharomyces cerevisiae chromosome 4. Pretty exciting, isn't it?

Removing files

To remove files and directories, use the 'remove' command rm, e.g.:

[john@ccbtl11 john]$ cd [john@ccbtl11 john]$ ls
crap/ mp3/ newfile.txt rna/ taing/ yeast/
[john@ccbtl11 john]$ rm newfile.txt
rm: remove newfile.txt (yes/no)? y
[john@ccbtl11 john]$ ls
crap/ mp3/ rna/ taing/ yeast/
[john@ccbtl11 john]$

Now to remove the yeast directory, use the rmdir command, e.g.:

[john@ccbtl11 john]$ rm -f yeast/chr04.fsa
[john@ccbtl11 john]$ rmdir yeast
[john@ccbtl11 john]$

Two things to note here. The first is that the -f flag is the "force" flag -- the shell didn't ask whether or not I was sure I wanted the file removed, it just did it. The second is that the rmdir command only works on empty directories.

more or less

To look quickly at a file's contents without opening it in Text Wrangler, use the commands more or less. These quickly and crudely display the files contents in the terminal. You can scroll though the file page by page using the spacebar. Press Q to quit out of them and return to the shell prompt. less is basically just a fancier more program; it allows you to scoll though with up and down arrow keys. But less is not installed on all UNIX computers. These programs are usefull when you just want to check the contents of a file without editing it. (E.g., is my data in the file important_file1.dat or important_file2.dat?)

Don't use spaces, exclamation points, question marks, ...

One more thing about file names in UNIX. Unlike the Windows and Mac worlds, spaces are no good. Neither are exclamation points, question marks, and all such things. Don't use them. Use the underscore "_" or dash "-" instead. Basically just stick to alphanumerics (a though z, A through Z, and 0 through 9), underscores, dashes, and periods. I know it's kind of primitive, but it's just the way it is.

Miscellaneous tips and tricks

Here's a bunch of things that don't fit neatly into a catagory but are important and useful.


Computer folks often use the acronym RTFM (for Read The F*#!ing Manual) in response to stupid questions. It's not nice, but reading the manual is often times a better way to learn that asking a question. UNIX has what are called "man pages" for many (but not all) of the common commands. If you want to learn more about ls, for example, type man ls at the command line. Below is an example of what a typical man page looks like. Try using some of the options with the ls command to see what the output looks like. (The -a, -l, and -h are commonly used flags with ls.)

[john@ccbtl11 john]$ man ls

NAME ls - list contents of directory

/usr/bin/ls [ -aAbcCdfFgilLmnopqrRstux1 ] [ file ... ]

For each file that is a directory, ls lists the contents of the directory; for each file that is an ordinary file, ls repeats its name and any other information requested. The output is sorted alphabetically by default. When no argument is given, the current directory is listed. When several arguments are given, the arguments are first sorted appropriately, but file arguments appear before directories and their contents.

The following options are supported:

-a List all entries, including those that begin with a dot (.), which are normally not listed.
-A List all entries, including those that begin with a dot (.), with the exception of the working directory (.) and the parent directory (..).
-b Force printing of non-printable characters to be in the octal \ddd notation.
-c Use time of last modification of the i-node (file created, mode changed, and so forth) for sorting (-t) or printing (-l or -n).
-C Multi-column output with entries sorted down the columns. This is the default output format.
-d If an argument is a directory, list only its name

... and so on ...

Some other helpful sites are:

Intro to UNIX from Lincoln Stein's CSHL Genome Informatics course.

Intro to Unix commands from Indiana University.

It can also be helpful to do Google searches for UNIX tips. If you are really into it, I can recommend a couple of good UNIX books.

Tab completion

Tab completion rules. Say you have five files in a directory called


After cursing yourself for naming the files so stupidly, you need to look though the files with less to find the data you want. Except you don't want to keep typing really_really_important_data_wow_so_important... everytime. Use tab completion. It works like this. Type "r". Hit tab. The shell will finish typing the names of all the files that begin with "r", up until there is a character that isn't common to all the files. E.g.,

[john@ccbtl11 john]$ r

[Hit tab]

[john@ccbtl11 john]$ really_really_important_data_wow_so_important

[And the shell will type everything out the the numbers 1, 2, 3, and so on.]

Here you can hit tab two times in row quickly, and the shell will give you all of the files that complete the match. This works in any case. Try hitting tab twice at a blank command prompt. Cool, huh? Play around with tab completion, it's a handy thing and second nature to UNIX folks.


You can use what is called a wildcard (the "*" symbol, or astericks) when dealing with lists of files. For example, if you were in a directory with say 100 files in it; some multiple alignment files, some blastn files, some blastx files, and so on, and you were interested in only the blastx files in the directory, you could simply do a ls and look by eye:

[john@ccbtl11 john]$ ls

Which is a bit of a chore to look through, don't you think? (Quick, how many blastx files are in there? Are you sure?) To get a quick list of all of the blastx files, use the wildcard:

[john@ccbtl11 john]$ ls *.blastx

Which is a slightly more managable output. The wildcard is very powerful; play with it a bit. It's also very dangereous when used in the rm command. Be careful! (As a side note, you've just learned about your first regular expression without realizing it. More later on that.)

Putting processes in the background

Say you want to start a long blast run (that will take a week) but you want to do some file management at the command line, too. You could open a new terminal while blast is running, and this is a fine idea, but your desktop can get pretty crowded with windows. You can put processes in the backround in UNIX. If you just type

[john@ccbtl11 john]$ blastn -db nt -query hugefile.fna

you start your blast job, but you don't get the command line back. To get the command line back, append the "&" symbol to the end of the command, e.g.:

[john@ccbtl11 john]$ blastn -db nt -query hugefile.fna &
[john@ccbtl11 john]$

Voila! You have started your blast job and you have the shell prompt back to keep on working. If you forget to add the ampersand to the end of the command, you can get the shell prompt back by hitting CTRL-Z and then typing bg to put it in the background.

Redirecting output

Now imagine you are blasting your favorite protein against Genbank, and that your favorite protein is a kinase. You are gonna get a helluva lot of hits, and unless you are the Greatest American Hero (anyone? anyone?), you ain't gonna be able to read the blast report as it scrolls past your terminal at the speed of light. You need to direct the output into a file to study at your leisure. You can do this with the ">" symbol, e.g.:

[john@ccbtl11 john]$ blastp -db nr -query mykinase.faa > genbank_nr_v_mykinase.blastp &
[john@ccbtl11 john]$

This looks complicated, but it's pretty simple. I blasted the file mykinase.faa against the genbank non-redundant database (nr). The part of the command [ > genbank_nr_v_mykinase.blastp &] tells the shell to put the results of the blast search into a file called genbank_nr_v_mykinase.blastp and to put the whole process in the background so you can keep working while the blast job is running.

Command history

OK, now say you just typed in the long blastp command line argument above (using tab completion of course), but that you accidentally typed in my_kinase.f instead of my_kinase.faa. Blastp will return an error. Instead of retyping the whole command, you can use the arrow keys. If you hit the "up" or "down" arrow keys, the shell will show you all of the commands you have typed in during this session (sometimes, depending on how the system is set up, from previous sessions, too). You can just edit the mistake you made using the left, right, backspace, etc. keys (just add an "a" to your filename in this case) rather than retyping the whole thing.

If you want to see what you have typed in recently, type history at the command line. It will show you a numbered list of the commands you have used. If you want to rerun a command and the command arguments are complicated (as in the blast example above), type "!" (usually called shebang) followed by the number of the command line argument and the shell will rerun it.

Moving files to and from remote computers

Here is a website with a list of free SSH and SCP clients for windows machines.

There are two ways to move files from computer to computer in the UNIX world, the bad way (using ftp), and the good way (using scp). ftp stands for File Transfer Protocol, and is the original file transfer program. It's OK for transferring files, but it stinks security-wise. It sends your username and password in plain, human readable text across the network so that any punk 13 year old kid in the Netherlands can dip into the network traffic, get your username and password, and then bring down the DBS department computers. The Officially-sanctioned BIOB 491 Way to transfer files is with scp, or Secure CoPy. All of the computers that we use should have scp installed. scp's syntax is simple to use; it's very similar to the cp command, e.g.:

[john@ccbtl11 john]$ scp somefile.txt
john@warlord's password:
somefile.txt 100% |*****************************| 256 00:00
[john@ccbtl11 john]$

In this case, I am transferring the file somefile.txt to the computer warlord in my home directory (indicated by the tilde). The colon after the remote computer's name is important - it tells scp that we are indeed transferring this file to a remote computer.

If I were transferring the file from a remote computer to the local machine, we would do something like this:

[john@ccbtl11 john]$ scp .
john@warlord's password: 100% |*****************************| 196585 00:01
[john@ccbtl11 john]$

Here I have told scp that we want the file called somefile.txt fromthe perl directory (which itself is in my home directory) to be placed in the current directory, indicated by the "." (which always means the current directory).


Permissions are a very important part of the UNIX environment. Permissions are the rights that you and others have on files and directories in the UNIX filesystem. For example, a file that contains the website for this course will have much different permissions than the file the contains the scores for your homeworks. You can give and take away the rights to read, write, and execute the files and directories under your home directory. Here are the three types of permissions and what they enable one to do to files and directories:

These permissions can be granted to four different kinds of users:

To look at the permissions on a file or directory, use the ls -l command. E.g.:

[john@ccbtl11 john]$ cd taing
[john@ccbtl11 taing]$ls -l
total 4
drwx------ 2 john bio5488 512 Dec 20 10:41 hw/
drwxr-xr-x 2 john bio5488 512 Dec 20 10:41 perl/
[john@ccbtl11 john]$

You can see the two directories hw and perl listed there, along with a bunch of other info. The important parts are noted here:

flags owner group  world    user  group          date modified

  d    rwx   ---    ---   2 john bio5488      512 Dec 20 10:41 hw/
  d    rwx   r-x    r-x   2 john bio5488      512 Dec 20 10:41 perl/

To change the permissions on files and directories, use the chmod command, e.g.:

[john@ccbtl11 taing]$ chmod a+w perl/
[john@ccbtl11 taing]$ ls -l
total 4
drwx------ 2 john bio5488 512 Dec 20 10:41 hw/
drwxrwxrwx 2 john bio5488 512 Dec 20 10:41 perl/

Which says to give the entire world the right make and delete files in the perl directory (bad idea). Let's undo that:

[john@ccbtl11 taing]$ chmod og-wr perl/
total 4
drwx------ 2 john bio5488 512 Dec 20 10:41 hw/
drwxr--r-- 2 john bio5488 512 Dec 20 10:41 perl/
[john@ccbtl11 john]$

Better. Now everyone can ls the contents of perl, but no one but you can do anything with the files.

Pipes (the "|" symbol)

Pipes allow you to feed the output of one command into another. The most common time to use this feature for beginners (that includes me) is to feed the output of a ls command into the programs more or less in directories with so many files that the ls command cannot fit them into a single shell window. So, for example, if you were to do a ls in the directory used in the Wildcard section above in a smallish shell window, the results would just fly by you. Use the pipe:

[john@ccbtl11 john]$ ls | less

and you can scroll through the ls output at your leisure.

Created by John McCutcheon, 2013