This will install a BLAST executable that you can use to remotely query the NCBI database.

condainstallmuscle

This will install MUSCLE, alignment program that you can use to align nucleotide or protein sequences.

We will also install RAxML-NG, a phylogenetic tree inference tool, which uses
maximum-likelihood (ML) optimality criterion. However, there is no conda
repository for it yet. Thus, we need to download it manually.

This will yield a file that has only the sequences of the subject, so that we can later add those to other fasta files.
However, the formatting is not perfect.
To adjust the format such that it is fasta format, open the file in an editor (e.g. nano) and edit the first line so that it has a name for your sequence.
You should know the general format of a fasta-file (e.g. the first line start with a “>”).

Hint

To edit in vi editor, you will need to press the escape key and “a” or “e”.
To save in vi, you will need to press the escape key and “w” (write).
To quit vi, you will need to press the escape key and “q” (quit).

Next, you have to replace the dashes (signifying indels in the BLAST result).
This can easily be done in vi:
Press the escape key, followed by: :%s/\-//g

Now we will BLAST a remote database to get a list of hits that are already in the NCBI database.

Note

It turns out you may not be able to access this database from within BioLinux. In such a case, download the file named blast.fas and place it into your ~/analysis/phylogeny/ directory.

curl-Ohttp://compbio.massey.ac.nz/data/203341/blast_u.fas

Append the fasta file of your yeast sequence to this file, using whatever set of commands you wish/know.

We will use RAxML-NG to build our phylogeny.
This uses a maximum likelihood method to infer parameters of evolution and the topology of the tree.
Again, the syntx of the command is fairly simple, except you must make sure that you are using the directory in which RAxML-NG sits.

The arguments are:

-s: an alignment file

-m: a model of evolution. In this case we will use a general time reversible model with gamma distributed rates (GTR+GAMMA)

We will use the online software Interactive Tree of Life (iTOL) to visualize the tree.
Navigate to this homepage.
Open the file containing your tree (*bestTree.out), copy the contents, and paste into the web page (in the Tree text box).

You should then be able to zoom in and out to see where your yeast taxa is.
To find out the closest relative, you will have to use the NCBI taxa page.

Todo

Are you certain that the yeast are related in the way that the phylogeny suggests? Why might the topology of this phylogeny not truly reflect the evolutionary history of these yeast species?