Finding out about Ensembl genes and transcripts - solutions

Exercise 1 – Exploring the human MYH9 gene

Click on either the Ensembl ID ENSG00000100345 or the HGNC official gene name MYH9.

Chromosome 22 on the reverse strand.

Ensembl has 11 transcripts annotated for this gene, of which three are protein coding.

The longest transcript is MYH9-201 and it codes for a protein of 1,960 amino acids

MYH9-201 is the best quality transcript, as it has a CCDS associated with it, is TSL:1 and is Golden.

(b) These are some of the phenotypes associated to MYH9 according to MIM: autosomal dominant deafness, Epstein syndrome, and Fechtner syndrome. Click on the records for more information.

(c) The Gene Ontology project (http://www.geneontology.org/) maps terms to a protein in three classes: biological process, cellular component, and molecular function. Meiotic spindle organisation, cell morphogenesis, and cytokinesis are some of the roles associated with MYH9.

(d) Click on ENST00000216181

It has 41 exons. This is shown in the Transcript summary or in the left hand side menu Exons page.

Click on the Exons link in this side menu. Exon 1 is completely untranslated, and exons 2 and 41 are partially untranslated (UTR sequence is shown in purple). You can also see this in the cDNA view if you click on the cDNA link in the left side menu.

P35579 from UniProt/Swiss-Prot matches the translation of the Ensembl transcript. Click on P35579 to go to UniProtKB, or click align for the alignment.

(e) Click on Oligo probes in the side menu.

Probesets from Affymetrix, Agilent, Codelink, Illumina, and Phalanx match to this transcript sequence. Expression analysis with any of these probesets would reveal information about the transcript. Hint: this information can sometimes be found in the ArrayExpress Atlas: www.ebi.ac.uk/arrayexpress/

Exercise 2 – Finding a gene associated with a phenotype

Type phenylketonuria into the search box then click Go. Choose Gene from the left hand menu.

The gene associated with this disorder is PAH, phenylalanine hydroxylase, ENSG00000171759.

(b) If the transcript table is hidden, click on Show transcript table to see it.

There are six protein coding transcripts in release 90.

Click on Transcript comparison in the left hand menu. Click on Select transcripts. Either select all the transcripts labelled protein coding one-by-one, or click on the drop down and select Protein coding. Close the menu.

Exercise 3 – Exploring a bacterial gene (Clostridium sporogenes)

Select Clostridium sporogenes by beginning to write the species name, and selecting the species option.

Type PolC and click on the gene name link PolC [CLSPOx_12590].

Click on GO: biological process in the side menu.

There is one term listed: GO:0006260, DNA replication.

(b) Click on the transcript named PolC-1 (or on the Transcript tab).

PolC-1 is 4299 bp in length.

(c) Click on either Protein Summary or Domains & features in the left hand menu to see graphically or as a table respectively.

A Ribonuclease H-like domain is identified by two domain prediction methods. A DNA polymerase, alpha subunit is identified by three. An exonuclease domain is identified by two, a nucleic acid-binding domain is identified by two and a DNA Polymerase III epsilon subunit is identified by one.