Dear Galaxy,
I am analysing ChIP-Seq data from Illumina using Galaxy web server. I
mapped the reads with bowtie and did the peak calling with Macs.
The next thing I wanted to do is to annotate the peaks with genomic
regions i.e. promoter, intergenic, intron etc and gene names.
I am not sure if these can be achieved through Galaxy and if so, how
can this be done? Thank you.
Catheryn

Hello Catheryn,
Yes, all of this can be done. Once you have an annotation source
identified (or sources!), the rest is part of the core functionality
of
Galaxy.
One of the outputs from MACS is a bed file with the peaks. BED format
is
similar to interval format and can be used with the tools in the group
"Operate on Genomic Intervals". Or if as BED, with tools in the group
"BEDTools" (such as 'Intersect multiple sorted BED files'). If you
need
help understanding these datatypes, this wiki explains - see the last
bullet for links:
http://wiki.galaxyproject.org/Support#Dataset_special_cases
The idea is to obtain annotation data also in BED/interval format,
then
perform the comparisons. Where there is overlap (or no overlap, in the
case of intergenic), the annotation can be assigned. I am not sure
what
genome you are working with, but if it is available from UCSC or
another
common public site, this can be fairly straightforward (but this is
very
important - the same, exact base reference genome that you mapped
against must be the one you extract annotation from - the name in
Galaxy
will be the same exact name as the source in nearly all cases - please
ask if you have a question about this).
At UCSC, the Table browser contains all the annotation tracks found in
the Browser itself, and you will most likely want to use those from
the
"Gene and Gene Prediction" group, although there are likely others in
the ENCODE group that are also of interest. The description for each
track is at UCSC, including methods, often very detailed. When
extracting the data (using the tool "Get Data -> UCSC Main table
browser"), options to subset the BED output regions by exons or
introns
or predicted promoter regions, etc. are available.
Biomart can be another great source of annotation, especially for
genomes in Ensembl annotation builds. The tool would be "Get Data ->
BioMart Central server". The same basic extraction concepts would
apply
although the form is organized differently. The help there will guide
you. The important parts are the chromosome, start, and end. The best
tip I can offer when working with Biomart data is to avoid HTML
content
- this is often found in the longer descriptions. If you get an import
error about HTML content, this isn't a huge problem. Just try again,
eliminating suspected fields - the field/s with the HTML can usually
be
identified quickly with a few test imports.
There are other sources in this "Get Data" tool group and many other
external annotation projects that have data (from these you can simply
download/upload or directly load via a URL). You can start with a
larger
file with all of the details, compare with just coordinates, then go
back and pick up the details with a final join. Some examples of how
to
do these types of operations are in our ChIP-seq example and in our
paper from last year, links here:
https://usegalaxy.org/u/james/p/exercise-chip-seqhttps://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012
Please note that the public Main server at usegalaxy.org will be
unavailable during US East coast business hours tomorrow as stated on
the current banner:
"TACC will be performing storage system updates on Tuesday, December 3
from 9 AM to 6 PM EST (UTC -0500). During this time, Galaxy will be
unavailable."
Hopefully this helps!
Jen
Galaxy team
--
Jennifer Hillman-Jackson
http://galaxyproject.org

Hi Jennifer,
Thank you for this great post!
I know its likely to be a tremendous redundancy as a request, but is
there somehow to move your response up in the search list?
Always grateful for your extreme patience with us all,
David
FHCRC

Thanks Zod,
Very nice of you to like and reply! No patience required :)
Nothing I know of will just move it up in the custom google searches
in one step, but we have some ideas in play about the lists that may
help with promoting certain Q/A threads soon. (No firm details quite
yet to share)
What can be done now is to put more of this type of content into the
wiki & tutorials. And then organize/label it well. The goggle searches
will pick up content from those sources as well, usually with better
focus.
Thanks for the suggestion!
Jen
Galaxy team