This is the welcome screen you should see after installing a new copy of GBrowse_syn with no configured data sources. It contains instructions on how to set up the example data source provided with the distribution.

The format is CLUSTALW. This is a formatting convention; it does not mean CLUSTALW was used to generate the alignment data. See Further Reading below for more information on data loading and the meta-data in the sequence names

4) Load the database with the script gbrowse_syn_load_alignments_msa.pl, which is automatically installed along with GBrowse. See the GBrowse_syn scripts page for details on the options for the script.

There are 1800 alignment blocks, so this will take a little while to run.

Setting up the Configuration Files

The configuration files required for this data source are pre-installed with GBrowse, in /etc/gbrowse2/synteny/.

There are two species' config files, rice_synteny.conf and wild_rice_synteny.conf, and the joining config file, oryza.synconf. The latter file has been disabled by appending a '.disabled' extension to the file name.

The joining config file, oryza.synconf:

[GENERAL]
description = BLASTZ alignments for Oryza sativa
====Sample Configuration Files====
# The synteny database
join = dbi:mysql:database=rice_synteny;host=localhost
# This option maps the relationship between the species data sources, names and descriptions
# The value for "name" (the first column) is the symbolic name that gbrowse_syn users to identify each species.
# This value is also used in two other places in the gbrowse_syn configuration:
# the species name in the "examples" directive and the species name in the .aln file
# The value for "conf. file" is the basename of the corresponding gbrowse .conf files.
# This value is also used to identify the species configuration stanzas at the bottom of the configuration file.
# name conf. file Description
source_map = rice rice_synteny "Domesic Rice (O. sativa)"
wild_rice wild_rice_synteny "Wild Rice"
tmpimages = /tmp/gbrowse2
imagewidth = 800
stylesheet = /gbrowse2/css/gbrowse_transparent.css
cache time = 1
config_extension = conf
# example searches to display
examples = rice 3:16050173..16064974
wild_rice 3:1..400000
zoom levels = 5000 10000 25000 50000 100000 200000 400000
# species-specific databases
[rice_synteny]
tracks = EG
color = blue
[wild_rice_synteny]
tracks = EG
color = red

In the conf directory, there are configuration files for the joining database and each of the three species. They are similar in structure to the examples shown above, except that the database adapter Bio::DB::GFF and a gene aggregator are used because the GFF is version 2. For example:

The gff directory contains gene annotations for each of the three species, derived from WormBase (release WS204). The files are in GFF2 format, which is why the Bio::DB::GFF adapter is required. A sample is shown here:

The file orthocluster.txt contains the synteny data. The first few lines are shown below. The first 12 fields in each row specify information about the synteny block in each species and the series of numbers are orthologous gene coordinate pairs that are used for linking orthologs with grid-lines in the GBrowse_syn display. See 'Alignment Data' under Further Reading below for more details of this loading format.

3) Set the $TMP environmental variable so that the database loading script knows where to put its temp files.

$ export TMP=/tmp

4) Create and load a Bio::DB:GFF database for C. elegans (ele). Use screen so that we can get the time-consuming loading script started and then use Ctrl-A D to set the screen running in the background and move on to other steps.

8) Go back to your browser and reload the rice page. There should now be a second data source in a pull-down menu.

9) Select the other data source and start browsing!

Further Reading

A Note on Whole Genome Alignments

The focus of the section of the course is on dealing with alignment or synteny data and using GBrowse_syn. However, how to generate whole genome alignments, identify orthologous regions, etc, are the subject of considerable interest, so some background reading is listed below: