This routine uses read-pair information to try to work out the left to right
order of sets of contigs.
It is invoked from the gap4 Edit menu.
At present it attempts to order all the contigs in
the database, and when finished it produces a listbox window which containing
one or more sets (one set per line) of contigs listed by the names of their
leftmost readings. By clicking on their names in the listbox the user can
request that these "super contigs" should be shown in the standard Template
display window
(see section Template Display).

Using the
tools available within this window the user can manually move or complement
any contigs which appear to have been misplaced. The combination
of automatic ordering and the facility to view the results by eye and manually
correct any errors make this a powerful tool. The new contig order can
be saved to the database by selecting the "Update contig order" command from
the "Edit" menu of the Template display. Note, however, that unlike the
editing operations in the Contig editor, which are only committed to the disk
copy of the database at the user's request, all the complementing operations
in gap4 are always performed both in memory and on the disk. This means that
any complementing done as part of the contig ordering process will be
immediately committed to disk.

An example of the "Super contig" listbox is shown here.

The example seen in the figures shows a Template display before and
after the application of the algorithm.

(Click for full size image)

Before ordering

(Click for full size image)

After ordering

Notice how the operation has reduced the large number of dark yellow (inconsistent) templates by ordering and complementing the contigs so that they are now
consistent and show in bright yellow. The few remaining dark yellow templates
represent problems, possibly with misassembly or with misnaming of
readings. The reliability of these dark yellow templates is also
questionable when noting that one or the other of the readings are
typically within the middle of large contigs, and hence are not likely
to be spanning contigs. The gaps between the contigs, shown in the ruler
at the bottom of the template display, are real estimates of size of the
missing data, based on the expected lengths of the templates.

The algorithm is based on ideas used to build cosmid contigs using
hybridisation data Zhang,P, Schon,EA, Fischer,SG, Cayanis,E,
Weiss,J, Kistler,S and Bourne,P, (1994) "An algorithm based on graph
theory for the assembly of contigs in physical mapping of DNA", CABIOS
10, 309-317. A difficulty for algorithms of this type is dealing with
errors in the data, i.e. pairs of readings that have been incorrectly
assigned to the same template (often by simple typing errors made prior
to the creation of the experiment files). Our algorithm uses several
simple heuristics to deal with such problems but one known problem is that
it does not correctly deal with cases where templates span non-adjacent
contigs, or where such contigs interleave.

This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_94.html