Now let’s try clustering a real microarray dataset. If rpy2, R, and
Bioconductor are installed, we have access to all of the functionality
of Bioconductor, including lots of datasets. A list of all experiment
data packages is available here. (It is
also possible to get a list interactively. See rpy2’s documentation
for interactive work)

Let’s install the ALL data, which consists of 128 microarrays from
patients with acute lymphoblastic leukemia (ALL). We first need to get
the bioclite command, which requires an internet connection:

>>> bioclite=bibench.rutil.get_bioclite()>>> bioclite('ALL')

Now that the data is installed, we can get it:

>>> data=bb.get_bioc_data('ALL')

Gene expression data files have labelled rows (genes/probes) and
columns (samples). We can access them as members of our data. Here we
have Affymetrix probes for our row labels:

Since this is a real dataset, the true biclusters are
unknown. However, we can do Gene Ontology enrichment analysis on each
bicluster. Since this dataset was obtained from Bioconductor, we can
find which annotation package to use by simply examing the
annotation attribute:

>>> data.annotation'hgu95av2'

We check the biclusters for enriched Gene Ontology terms, using
the datasets probes as the gene universe:

It is easy to get a GO id’s full annotation. For instance, suppose we
wanted to get GO:0048870‘s
full annotation:

>>> bb.goid_annot('GO:0048870')GoAnnot(goid='GO:0048870', term='cell motility', ontology='BP', synonym=None, secondary=None, definition='Any process involved in the controlled self-propelled movement of a cell that results in translocation of the cell from one place to another.')

Then using bb.goid_annot we can get the first bicluster’s actual
enriched terms:

The Gene Expression Omnibus is a
handy resource for expression data. BiBench provides an interface to
the geometadb
package, so we can query GEO metadata to find the appropriate
dataset. The first time this functionality is used, the SQLlite
database is automatically downloaded and stored in
$HOME/.bibench/GEOmetadb.sqlite.

Suppose we want to find a curated (GDS) dataset that was generated
from an Affymetrix chip: