Tuesday, 17 December 2013

Ensembl API workshop

I attended a great Ensembl API workshop last week in the University of Cambridge, and learnt loads of things about the Ensembl API.

The course was divided up into different sections, on the different parts of the Ensembl API (core api, variation, comparative genomics, functional genomics, etc.). The instructors set us lots of nice exercises, and I've included my answers to the exercises below.Ensembl Compara (Comparative Genomics)
This part of the course was taught by Matthieu Muffato and Stephen Fitzgerald, whose course notes are here:
Matthieu Muffato: course notes
Stephen Fitzgerald: course notesExercises:
1) Print the sequence of the [Compara] Member corresponding to SwissProt protein O93279: exercise1a_compara.pl
2) Find and print the sequence of all the peptide Members corresponding to the human protein-coding gene(s) FRAS1: exercise2a_compara.pl
3) Get the multiple alignment corresponding to the family (a 'family' can contain both UniProt and Ensembl members) with the stable id ENSFM00250000006121: exercise3_compara.pl
4) Get the families that the human gene ENSG00000139618 belongs to, and print out their members (note: a 'family' can contain both UniProt and Ensembl members): exercise4_compara.pl
5) Print the protein tree with the stable id ENSGT00390000003602 (note: a 'tree' can only contain Ensembl members, not UniProt members): exercise5_compara.pl
6) Print all the members of the tree containing the human ncRNA gene ENSG00000238344: exercise6_compara.pl
7) Get all the homologues for the human gene ENSG00000229314: exercise7_compara.pl
8) Count the number of one-to-one orthologues between human and mouse: exercise8_compara.pl

Making a plot of a tree:
The script in exercise 5 above extracts the tree in several formats, the last of which is called 'display_label_composite' NHX format by Compara:((((((BRCA2_ENSXMAG00000006974_Xmac:0.4552[&&NHX:D=N:T=8083],BRCA2_ENSORLG00000003832_Olat:0.6586[&&NHX:D=N:T=8090])Atherinomorpha:0.0644[&&NHX:D=N:B=68:T=32456],BRCA2_ENSGACG00000011490_Gacu:0.1748[&&NHX:D=N:T=69293])Smegmamorpha:0.0090[&&NHX:D=N:B=1:T=129949],((BRCA2_ENSTRUG00000006177_Trub:0.0650[&&NHX:D=N:T=31033],BRCA2_ENSTNIG00000016261_Tnig:0.1058[&&NHX:D=N:T=99883])Tetraodontidae:0.1811[&&NHX:D=N:B=93:T=31031],BRCA2_ENSONIG00000005522_Onil:0.2923[&&NHX:D=N:T=8128])Percomorpha:0.0254[&&...NHX:D=N:B=100:T=9005],BRCA2_ENSTGUG00000011763_Tgut:0.1201[&&NHX:D=N:T=59729])Neognathae:0.2718[&&NHX:D=N:B=100:T=8825],BRCA2_ENSACAG00000004541_Acar:0.3673[&&NHX:D=N:T=28377])Sauria:0.0344[&&NHX:D=N:B=98:T=32561],BRCA2_ENSPSIG00000011574_Psin:0.2148[&&NHX:D=N:T=13735])Sauropsida:0.1177[&&NHX:D=N:B=98:T=8457])Amniota:0.1225[&&NHX:D=N:B=100:T=32524],brca2_ENSXETG00000017011_Xtro:0.7609[&&NHX:D=N:T=8364])Tetrapoda:0.1639[&&NHX:D=N:B=6:T=32523],BRCA2_ENSLACG00000007788_Lcha:0.2902[&&NHX:D=N:T=7897])Sarcopterygii:0.2981[&&NHX:D=N:B=5:T=8287])Euteleostomi:0[&&NHX:D=N:B=0:T=117571];
If you put this into a file (eg. tree.nj), then you can make a picture of the tree using Li Heng's NJTREE software, which you can download from sourceforge, by typing, for example:
% ~alc/Documents/bin/treebest/treebest export -f 8 tree.nj > tree.eps
Here -f8 sets the font size to be 8 in the image. Here's the picture:

It doesn't show the duplication and speciation nodes in different colours, but that's ok.
[Note to self: it's possible to make a PNG that has the duplication and speciation nodes in different colours by using Li Heng's Perl script. You need to copy the tree.nj file to /nfs/users/nfs_a/alc/Documents/bin/njtree_plot, then type:
% perl nhxplot.pl tree.nj > tree.png
This gives:

You can see the duplication nodes in red and speciation nodes in blue. Very nice! ]