Exercises for bioinformatics.psc.edu:
Phylogenetic Analysis

This exercise is designed to introduce you to the steps involved in creating and viewing phylogenetic trees. This exercise is not a complete step-by-step example, you will have to think about the problem and what you are trying to acomplish before moving on to the next step. Please read the entire step before typing in anything on the computer. Also please make a printout of this web page and fill in the blank lines. Your responses will often be referred to in later steps.

Program Setup

Make a directory for the phylogenetics exercise by typing: mkdir phylo

Change your directory to the phylo directory. Enter: cd phylo

Copy a sample multiple sequence alignment to use for this exercise. enter: cp /biomed/lib/example/sprot30.msf sprot30.msf . Write the name of the copied file below (i.e. sprot30.msf). _________________________________________________________________________

First convert the alignment to the PHYLIP interleaved format with the readseq program. Enter the command: readseq

Enter an output file name (a name that you made up) such as sprot30.phylip. Write that name below:
_________________________________________________________________________

Select output file format 12 (PHYLIP).

Enter the file name listed on step 1.4 as the input file.

Select to convert all sequences.

Press Enter to quit readseq.

Create a file of PHYLIP pairwise distances using protdist

Enter the command protdist.

The program will responds can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 1.6.

You will now see a list of options. You may want to use the P option to page through the gatagories until the Dayhoff PAM matrix is listed. Next, indicate that the settings are correct by entering Y.

The program will then compute distances and will require only a few seconds of cpu time. The results will be written to a file names outfile.

Rename the output file created by the protdist program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30.protdist, then you would enter: mv outfile sprot30.protdist). Write the name that you selected for the [newoutname] file below:________________________________________________________________

Create a neighbor joining tree from the set of distances file of PHYLIP pair wise
distances using the neighbor program

Compute the tree by entering the neighbor command.

The program will responds can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 2.5.

You will now see a list of options. Once again these options should all be correct, but you may want to check to make sure that the Neighbor-joining tree option and not the UPGMA tree option is selected. Next, indicate that the settings are correct by entering Y.

The program will then compute distances and will require only a few seconds of cpu time. The results will be written to two files: outfile and outtree

Rename the outfile file created by the neighbor program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30.neighor, then you would enter: mv outfile sprot30.neighbor). Write the name that you selected for the [newoutname] file below:
_________________________________________________________________________

Use the more command to examine the output file listed in step 3.5

Rename the outtree file created by the neighbor program. Use the mv command to do this by entering mv outtree [newtreename] where [newtreename] is a filename that you made up. (For example, if the [newtreename] that you made up was sprot30.tree, then you would enter: mv outfile sprot30.tree). Write the name that you selected for the [newtreename] file below: _________________________________________________________________________ The outtree file is a New Hampshire formated file that can be viewed with a variety of freeware tree viewers.

View the tree file There are a number of ways that the treefile can be viewed. First, the outtree file can be read in by the drawtree program which can display the tree and write out a postscript file of the tree (which can then be previewed using a postscript previewer such as ghostscript or downloaded and printed on a local postscript printer.) Second, the ATV program can be used to display the tree file on a remote x-windows conpatable computer. Finally, The outtree file can be downloaded onto to a local machine and viewed with a variety of freeware viewers such as TreeView, from Rod Page, or the ATV program from the Eddy group at University of Washington at St. Louis.

Displaying trees using drawtree

In order to create graphics files using the drawtree program, you must first copy a required fontfile to working directory. Execute the command: cp /biomed/lib/phylip/font2 fontfile

The drawtree program can allow you to preview the tree directly from bioinformatics ONLY IF you are using a workstation or PC that can recieve X-window graphics. If you are using such a machine, do the following:

On bioinformatics.psc.edu Instruct the computer where it is to send the remote graphics display by giving it the name of your local computer by entering the command setenv DISPLAY localcomputer:0.0 where localcomputer is the Internet name of your local computer and will have a form similar to computer.site.sitetype such as bioinformatics.psc.edu. (For example, setenv DISPLAY ctc01.psc.edu:0.0) If you are unsure of the local address of the computer that you are using, issue the command who -m. The localcomputer that you are using will be listed within parenthesis at the end of the line starting with your user id. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, this name will be something like ctc01.psc.edu.

Make sure that your localcomputer is set up to accept and display remote windows from bioinformatics.psc.edu. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, follow the separate instructions given earlier on what needs to be done with the X-WIN32 software

Run drawtree. Enter the command drawtree

The program will responds can't find input tree file intree. At the Please enter a new filename> prompt, enter the New Hampshire outtree name listed in step 3.7 (or step 5.33)

Enter V to display the preview options.

IF YOU ARE USING A WORKSTATION THAT CAN DISPLAY X-WINDOWS GRAPHICS, select option X OTHERWISE select option N - will not be previewed.

Enter P to display the plotting device options.

Select L to create a postscript file.

Enter Y to accept the default settings

Select will not be previewed. Enter: N

Accept the default settings. Enter: Y

The program will show you a preview of the results if a preview was selected in step 4.1.6 Once you are done viewing the preview select menu option FILE then QUIT

The (Postscript) results will be written to the file named: plotfile

Rename the plotfile file created by the drawtree program. Use the mv command to do this by entering mv plotfile [newplotname] where [newplotname] is a filename that you made up. (For example, if the [newplotname] that you made up was sprot30.ps, then you would enter: mv outfile sprot30.ps). Write the name that you selected for the [newplotname] file below:______________________________________________________________

The postscript file can now be transfered to your local computer and printed on a local postscript printer or be displayed with a postscript previewer such as ghostscript.

Displaying trees using the AVT tree viewing program.

The ATV tree viewing program is installed on bioinformatics, however in order to use it you MUST be using a workstation or PC that can recieve X-window graphics. If you are using such a machine, do the following:

On bioinformatics.psc.edu Instruct the computer where it is to send the remote graphics display by giving it the name of your local computer by entering the command setenv DISPLAY localcomputer:0.0 where localcomputer is the Internet name of your local computer and will have a form similar to computer.site.sitetype such as bioinformatics.psc.edu. (For example, setenv DISPLAY ctc01.psc.edu:0.0) If you are unsure of the local address of the computer that you are using, issue the command who -m. The localcomputer that you are using will be listed within parenthesis at the end of the line starting with your user id. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, this name will be something like ctc01.psc.edu. (NOTE YOU DO NOT NEED TO DO THIS STEP IF YOU ALREADY DID STEP 4.1.2.1 ABOVE)

Make sure that your localcomputer is set up to accept and display remote windows from bioinformatics.psc.edu. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, follow the sepparate instructions given earlier on what needs to be done with the X-WIN32 software (NOTE YOU DO NOT NEED TO DO THIS STEP IF YOU ALREADY DID STEP 4.1.2.2 ABOVE)

Use the ATV program to view the New Hampshire outtree file. Enter atv [outtree] where [outtree] is the file you listed in step 3.7 (or step 5.33)

To quit the ATV program, select menu option File then Exit

To display the outtree file and/or a drawtree postscript file on a local computer is beyond the scope of this hands on. However, if you want to pursue thsi option, consider the following:

The postscript file can be be viewed by the ghostscript program on a number of different types of computers and operating systems. The ghostscript program can be downloaded from: http://www.ghostscript.com/

The New Hampshire outtree can be be viewed by the ATV program on a number of different types of computers. The ATV program can be downloaded from: http://www.genetics.wustl.edu/eddy/atv/

Create a bootstrap consensus tree

Run the seqboot and create 10 resampled alignments. Enter: seqboot

The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 1.7.

You will now be presented with a menu of options. Select R (replicates)

Enter 10 as the number of replicates.

You will now be presented with a menu of options. Select Y to indicate that the settings are correct.

You will now be asked to enter a random number seed. Enter an odd number such as 459863

The program will then write the results to the file named outfile.

Rename the outfile file created by the program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.seqboot, then you would enter: mv outfile sprot30_boot.seqboot). Write the name that you selected for the [newoutname] file below:______________________________________________________________

Enter the command protdist.

The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 5.8

You will now see a list of options. Select option M to indicate that multiple data sets are to be analyzed.

You will now be asked if you have multiple data sets or multiple weights.. Select option D to indicate that multiple data sets are to be analyzed.

Enter 10 because there are 10 data sets.

You may want to use the P option to page through the catagories until the Dayhoff PAM matrix is listed (to be consistant with step 2.3 above)

Next, indicate that the settings are correct by entering Y.

The program will then compute distances and will require a few minutes to run. The results will be written to a file named outfile.

Rename the output file created by the protdist program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.protdist, then you would enter: mv outfile sprot30_boot.protdist). Write the name that you selected for the [newoutname] file below:_____________________________________________________________

The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 5.17

You will now see a list of options. Select option M to indicate that multiple data sets are to be analyzed.

Enter 10 because there are 10 data sets.

You will now be asked to enter a random number seed. Enter an odd number such as 459863

Next, indicate that the settings are correct by entering Y.

The program will then require a few minutes to run. The results will be written to two files: outfile and outtree

Rename the outfile file created by the neighbor program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.neighor, then you would enter: mv outfile sprot30_boot.neighbor). Write the name that you selected for the [newoutname] file below:______________________________________________________________

Use the more command to examine the output file listed in step 5.25

Rename the outtree file created by the neighbor program. Use the mv command to do this by entering mv outtree [newtreename] where [newtreename] is a filename that you made up. (For example, if the [newtreename] that you made up was sprot30_boot.tree, then you would enter: mv outtree sprot30_boot.tree). Write the name that you selected for the [newtreename] file below:______________________________________________________________Finally, the trees (in the New Hampshire formatted file) need to be analyzed by the PHYLIP program consense which will produce a consensus tree. Enter: consense

The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 5.27

Next, indicate that the settings are correct by entering Y.

The program will take a few seconds to run and will produce two output files outfile and outtree

Rename the outfile file created by the consense program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.consense, then you would enter: mv outfile sprot30_boot.consense). Write the name that you selected for the [newoutname] file below:_______________________________________________________________

Use the more command to examine the output file listed in step 5.31

Rename the outtree file created by the neighbor program. Use the mv command to do this by entering mv outtree [newtreename] where [newtreename] is a filename that you made up. (For example, if the [newtreename] that you made up was sprot30_boot.constree, then you would enter: mv outtree sprot30_boot.constree). Write the name that you selected for the [newtreename] file below:____________________________________________________________The outtree can be viewed or converted into a postscript file by following step 4 (View the tree files) with the outtree listed in step 5.33. How does the single alignment tree compare to the bootstrapped tree?