This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

A fundamental question in comparative genomics concerns the identification of mechanisms
that underpin chromosomal change. In an attempt to shed light on the dynamics of mammalian
genome evolution, we analyzed the distribution of syntenic blocks, evolutionary breakpoint
regions, and evolutionary breakpoints taken from public databases available for seven
eutherian species (mouse, rat, cattle, dog, pig, cat, and horse) and the chicken,
and examined these for correspondence with human fragile sites and tandem repeats.

Results

Our results confirm previous investigations that showed the presence of chromosomal
regions in the human genome that have been repeatedly used as illustrated by a high
breakpoint accumulation in certain chromosomes and chromosomal bands. We show, however,
that there is a striking correspondence between fragile site location, the positions
of evolutionary breakpoints, and the distribution of tandem repeats throughout the
human genome, which similarly reflect a non-uniform pattern of occurrence.

Conclusion

These observations provide further evidence that certain chromosomal regions in the
human genome have been repeatedly used in the evolutionary process. As a consequence,
the genome is a composite of fragile regions prone to reorganization that have been
conserved in different lineages, and genomic tracts that do not exhibit the same levels
of evolutionary plasticity.

Background

Evolutionary biologists have long sought to explain the mechanisms of chromosomal
evolution in order to better understand the dynamics of mammalian genome organization.
Early work in this area led Nadeau and Taylor [1] to propose the 'random breakage model' of genomic evolution, based on linkage maps
of human and mouse. Their thesis relied on two assumptions: first, that many chromosomal
segments are expected to be conserved among species and, second, that chromosomal
rearrangements are randomly distributed within genomes. More than 20 years later,
in large part due to molecular cytogenetic studies, large-scale genome sequencing
efforts, and new mathematical algorithms developed for whole-genome analysis, the
first assumption has been confirmed. However, the second has been questioned by the
'fragile breakage model' [2], which considers that there are regions ('hotspots') throughout the mammalian genome
that are prone to breakage and reorganization [3,4].

Most recently, Murphy and colleagues [5] extended these analyses to include homologous synteny block (HSB) data from radiation
hybrid maps of dog, cat, pig, and horse. Their findings corroborate the 'hotspot'
theory and that some chromosome regions are reused [2] during mammalian chromosomal evolution. Indeed, that about 20% of the evolutionary
breakpoint regions reported show reuse [5], particularly among the more rapidly evolving genomes (cattle, dog, and rodents),
led us [6] to question whether 'hotspots' identified in silico correspond to fragile sites that can be expressed in culture under specific conditions,
thus mirroring findings of a correlation between the location of fragile sites and
evolutionary breakpoints in primates, including human [7,8]. Our preliminary survey showed that at least 33 of the 88 cytogenetically defined
common human fragile sites contain evolutionary breakpoints in at least three of the
seven species analyzed by Murphy and colleagues [5].

But what are fragile sites? These are heritable loci located in specific regions of
chromosomes that are expressed as gaps or breaks when cells are exposed to specific
culture conditions or certain chemical agents such as inhibitors of DNA replication
or repair [9]. According to frequency of expression in the human population, and the mechanism
of their induction, fragile sites have been classically divided into two groups: common
and rare. Common fragile sites are considered part of the chromosome structure since
they have been described in different mammalian species (Rodentia [10], Carnivora [11,12], Perissodactyla [13], Cetartiodactyla [14] and Primates [7,15,16]), whereas rare fragile sites are found expressed in a small percentage of the human
population [17]. In total, 21 human fragile sites have been molecularly characterized: eight rare
fragile sites (FRAXA [18], FRAXE [19], FRAXF [20], FRA10A [21], FRA10B [22], FRA11B [23], FRA16B [24], and FRA16A [25]), and 13 common human fragile sites (FRA1E [26], FRA2G [27], FRA3B [28], FRA4F [29], FRA6E [30], FRA6F [31], FRA7E [32], FRA7G [33], FRA7H [34], FRA9E [35], FRA13A [36], FRA16D [37], and FRAXB [38]). Whereas the expression of rare fragile sites is known to be related to the amplification
of specific repeat motifs (CCG repeats and AT-rich regions), no simple repeat sequences
have been found to be responsible for the instability observed at common fragile sites.
Rather, they appear to have a high A/T content with fragility extending over large
regions (from 150 kilobases [kb] to 1 megabase [Mb]) in which the DNA can adopt structures
of high flexibility and low stability [39]. Clearly, resolution differences exist between cytogenetically defined fragile sites
in human chromosomes and the molecular delimitation of evolutionary breakpoints (themselves
fairly gross approximations given that radiation hybrid mapping data for five of the
eight species resulted in an average of 1.2 Mb for breakpoint regions [5]). Nonetheless, the fact that fragile sites represent large 'unstable' regions of
the genome [39] that in many instances span evolutionary breakpoints [7] is an observation that warrants further detailed analysis.

An intriguing aspect to emerge from comparative genomic studies performed largely
on primates and rodents is the finding that breakpoint regions are rich in repetitive
elements. In other words, there may be a causal link between the process of chromosome
rearrangement, segmental duplications [40-44], and some simple tandem repeats (for instance, the dinucleotide [TA]n [45] and [TCTG]n, [CT]n and [GTCTCT]n [46]). In addition, microsatellites have been implicated in the mechanism underlying the
chromosomal instability that characterizes some human fragile sites and constitutional
human chromosomal disorders. For example, some human rare and common fragile sites
have been found to be particularly rich in A/T minisatellites [39], and certain human chromosomal aberrations have been related to palindromic AT-rich
repeats [47,48], underscoring the presence of repetitive elements in regions of chromosomal instability.

With this as the background, we analyze the distribution of 1,638 syntenic blocks,
1,152 evolutionary breakpoint regions, and 2,304 evolutionary breakpoints taken from
public databases available for seven eutherian species (mouse, rat, cattle, dog, pig,
cat and horse) and chicken, and examine these for correspondence with fragile sites
and tandem repeat locations in the human genome. We show that evolutionary breakpoints
are not uniformly distributed and that there are certain human chromosomes and chromosomal
bands with high breakpoint accumulation. Additionally, there is a striking correspondence
between human fragile site location, the positions of evolutionary breakpoints, and
the distribution of tandem repeats throughout the human genome.

Results

Multispecies alignments

We analyzed homologous regions between the human genome and those of the rat, mouse,
cattle, pig, cat, horse, dog, and chicken. By using the HSBs described by Murphy and
coworkers [5] and adding data from the human/chicken and human/dog whole-genome sequence assemblies,
we were able to identify 1,638 syntenic blocks in the human genome (Additional data
file 4). (The dog radiation hybrid genome map data used by Murphy and coworkers [5] was replaced by the dog whole-genome assembly, which is now available.) The analysis
of the human/chicken and human/dog whole-genome sequence assemblies revealed a total
of 550 syntenic blocks among the three compared species (Additional data file 4).
The homologous chromosomal segments of the seven mammals and the chicken were plotted
against the 550 band human ideogram (Additional data file 1). We excluded the human
chromosome Y from our study of evolutionary breakpoint regions (see Materials and
methods, below).

In addition we identified the chromosomal position of 1,152 evolutionary breakpoint
regions of 4 Mb or less in size (Additional data file 5) in the human karyotype and
their corresponding evolutionary breakpoints (n = 2,304; Additional data files 1 and 5). The 2,304 evolutionary breakpoints grouped
within 352 evolutionary chromosomal bands, which represents 67.77% of the human genome
(2,217.46 Mb of the 3,272.19 Mb of the total human genome, NCBI35; Additional data
file 5). See Figure 1 for a schematic representation of evolutionary breakpoint regions, evolutionary breakpoints
and evolutionary chromosomal bands, as well as the Materials and methods section (below)
for definitions of these terms. Approximately 45% (159 out of 352) of the evolutionary
chromosomal bands contain evolutionary breakpoints in three or more of the eight species
compared herein (Additional data file 6). These data clearly show that the distribution
of the evolutionary breakpoints and breakpoint regions is concentrated in specific
bands and/or chromosomes.

Figure 1. Schematic representation of evolutionary breakpoint regions, evolutionary breakpoints,
and evolutionary chromosomal bands. An evolutionary breakpoint region is defined as
the interval between two syntenic blocks 4 megabases (Mb) or less in size. This is
done in order to avoid problems of low comparative coverage. Evolutionary breakpoints
are defined by sequences coordinates in any of the seven mammalian species compared
with human plus the chicken, and serve to delimit the start and end of each breakpoint
region. Evolutionary chromosomal bands correspond to any band in the human ideogram
that contains at least one evolutionary breakpoint in any of the eight species compared
with the human genome.

Given the distribution of evolutionary breakpoints outlined above, we proceeded to
determine whether there is a significant correlation between the position of evolutionary
breakpoints and the known location of fragile sites. We mapped all fragile sites (both
rare and common) and evolutionary breakpoint regions (regions ≤ 4 Mb; Table 1 and Additional data file 1) to their location on the human ideogram at the 550 band
resolution. Our examination reveals that 147 chromosomal bands express fragile sites
(both common and rare). A contingency analysis shows that those bands that express
fragility (they contain either rare or common fragile sites) have a tendency, although
not significantly so (P = 0.09), to concentrate evolutionary breakpoints as compared with bands that do not
express fragile sites. In fact, we observed 104 bands that contain fragile sites (rare
and common) and evolutionary breakpoints, in contrast to the 95.4 bands expected if
the distribution were random. A more refined analysis was subsequently conducted in
which four categories of chromosomal bands (those that contain common fragile sites,
those with rare fragile sites, bands with both common and rare fragile sites, and
finally bands with no fragile sites) were examined using contingency analysis. There
is a significant tendency (P = 0.01) for bands with rare fragile sites to accumulate evolutionary breakpoints (22
of the 24 bands known to express rare fragile sites contain evolutionary breakpoints
versus the 15.6 bands expected if the distribution were random). The same tendency
does not hold in the case of common fragile sites, where 73 of 111 bands that express
common fragile sites contain evolutionary breakpoints (72.2 expected), or bands that
contain evolutionary breakpoints but no fragile sites (248 observed versus 256.3 expected).

Table 1. The human ideogram at the 550 band resolution showing the location of fragile sites
and evolutionary breakpoints

As stated above, resolution differences exist between cytogenetically defined fragile
sites in human chromosomes and the molecular delimitation of evolutionary breakpoints.
That differences in resolution may confound the association between them is clearly
of concern. However, of the 12 autosomal common fragile sites that have been characterized
at the molecular level (Additional data file 8), six (FRA4F, FRA6E, FRA7E, FRA7G,
FRA7H, and FRA9E) were shown to span evolutionary breakpoints in at least one of the
species analyzed with an additional two fragile sites (FRA3B and FRA16D) located within
1 Mb of evolutionary breakpoints (Additional data file 8). Importantly, of the four
autosomal common fragile sites with the highest expression frequencies (FRA3B [28], FRA6E [30], FRA7H [34], and FRA16D [37]), two (FRA6E and FRA7H) are localized within evolutionary breakpoints, and two (FRA3B
and FRA16D) lie within 1 Mb of breakpoint boundaries. With respect to the eight cloned
rare fragile sites [18-25], three (FRA10A, FRA16A, and FRA16B) are located in bands that contain evolutionary
breakpoints in at least one of the species analyzed by us.

Distribution of tandem repeats

The distribution of tandem repeats in human chromosomes was analyzed using 250,000
bp search windows in order to determine whether there is any correspondence between
tandem repeats, fragile sites (both rare and common), and the location of evolutionary
breakpoints (Additional data files 2 and 8). The tandem repeats range from microsatellites
(unit size 1 bp to 6 bp) to different types of minisatellites (from 7 bp to 300 bp).
We identified a high concentration of tandem repeats in the telomeres and the pericentromeric
regions of each chromosome (Additional data file 2), mirroring earlier findings (for
instance, see Näslund and coworkers [49]). The distribution of tandem repeats (1 to 300 bp) along human chromosomes showed
that on average 3,738.56 bp of the 250,000 bp of genomic sequence contained in each
window comprised tandem repeats (about 1.5%). Chromosome 19 is exceptional for the
high number of repeats found along its length [50], which is almost double (8,377.27 bp) the average for the whole genome (Table 2 and 3). Additionally, chromosome 19 has been shown to be exceptional in many other genomic
features, most of which (including the high number of repeats) may be due to the extremely
high GC content of this chromosome [51,52].

Tandem repeats and evolutionary chromosomal bands

When analyzing the human genome in its entirety, but excluding the centromeric and
telomeric regions from the analysis, evolutionary chromosomal bands (E bands) tend
to contain significantly more (P < 0.05) tandem repeats than chromosomal bands not implicated in evolutionary change
(B bands; Table 2). It is noteworthy that in the case of human chromosomes 3, 15, 17, 18, and 21, E
bands contain significantly more tandem repeats than do the B bands (P < 0.05), whereas the converse holds for human chromosomes 8 and 16. In all other instances
no statistically supported differences were noted. Elimination of chromosome 19 from
the analysis, with its singularly high repeat content, reduces the difference between
E bands and B bands but not significantly so. In addition, we detected 256 human chromosomal
bands that contain regions with more than 6,000 bp of tandem repeats in the 250,000
bp of genomic sequence contained in each window. Of these high-density repeat loci,
76.95% (197 of 256) contain evolutionary breakpoints.

Tandem repeats and fragile sites

Overall, chromosomal bands that express fragile sites (rare and common combined) contain
significantly more tandem repeats (P < 0.05) than do bands that do not (Table 2 and 9). There are, however, differences evident among chromosomes. In the case of human
chromosomes 1, 5, 7, 8, 11, 12, and 22, chromosomal bands that express fragile sites
contain more tandem repeats than do bands that do not show fragility (P < 0.05). The converse holds for chromosomes 10, 14, 17, and 20, where regions of fragility
are not characterized by elevated tandem repeat levels. In the remaining human chromosomes
(2, 3, 4, 6, 9, 13, 15, 16, 18, and 19), there is no statistical relationship between
those bands that express fragile sites and have high numbers of tandem repeats, and
bands that do not (Table 2). Moreover, the statistically significant differences detailed above hold irrespective
of whether chromosome 19 is omitted from the analysis or not. Interestingly, 62.6%
(92 out of 147; Table 1) of the human bands that contain human fragile sites are localized in regions that
contain high densities of repeats (for instance, regions containing >6,000 bp of tandem
repeats in the 250,000 bp of genomic sequence contained in each window; see above).
No fragile sites have been described in the literature for human chromosome 21.

We examined the repeat content of the four categories of chromosomal bands (those
that express common fragile sites, bands with rare fragile sites, bands with both
common and rare fragile sites, and finally bands that do not contain fragile sites;
Additional data file 9). Those containing rare fragile sites were shown to have significantly
(P < 0.05) greater numbers of tandem repeats (average of 4,852.53 bp per 250,000 bp of
genomic sequence contained in each window) than any other category (3,714.86 bp per
250,000 bp of genomic sequence contained in each window in the case of common fragile
sites, the next most frequent category).

Discussion

Evolutionary breakpoints

Evolutionary breakpoints can be defined by levels of resolution [53]. The holistic perspective of evolutionary breakpoints has traditionally been underpinned
by molecular cytogenetic studies that assign regions of chromosomal homology to species
of the same or different orders of mammals at the chromosomal band level. Investigations
using comparative chromosome painting (ZOO-fluorescence in situ hybridization [ZOO-FISH]) involving more than 80 different species from almost all
of the recognized eutherian orders have defined regions of the human genome that are
implicated in chromosomal evolution (for review, see Froenicke [54]). The integration of cross-species chromosome painting data published from 30 nonprimate
species [54], and even greater numbers of primate species [8], clearly demonstrate that evolutionary breakpoints are not uniformly distributed
along the length human chromosomes, and in some cases they are conserved during chromosome
evolution.

The use of whole-genome comparisons (the reductionist view) allows for the delimitation
of evolutionary breakpoints at a finer level of resolution than can be obtained by
chromosome painting. By analyzing published data [5], and adding complementary information from the human/chicken and human/dog whole-genome
sequence assemblies, we were able to identify 1,152 evolutionary breakpoint regions
throughout the human genome at a resolution of 4 Mb or less, which contain 2,304 evolutionary
breakpoints. Plotting the evolutionary breakpoints included in our data onto the 550
chromosomal band human ideogram provided a means of combining the cytogenetic and
the sequence comparisons. This identified 352 human chromosomal bands that contain
evolutionary breakpoints and showed that the distribution of evolutionary breakpoints
is not uniform in the human genome. Quite clearly, there are evolutionary 'hot spots',
defined by chromosomal bands, which are coincidental with genomic reorganization characterizing
different lineages during the evolutionary process (breakpoint reuse [5]).

Evolutionary implications of fragile sites

Although the exact number of fragile sites described in the human genome is a matter
of interpretation, a recent revision lists 119 fragile sites, 88 of which are defined
as common and 31 as rare [39]. Our data show that human chromosomal bands that express fragile sites (both common
and rare combined) have a tendency to contain evolutionary breakpoints (Table 1), although the association is statistically supported only in the case of rare fragile
sites. This association suggests an important role for fragile sites in genome reorganization,
most likely by functioning as regions of chromosomal instability.

Although the mechanisms underlying the breakage at common fragile sites are still
poorly understood, rare fragile sites are associated with the amplification of repeat
motifs (CCG repeats and AT-rich regions). The molecular characterization of 13 common
fragile sites has revealed that there are no simple repeat sequences responsible for
their instability (for review, see Schwartz and coworkers [39] and Glover [55]). Rather they are enriched in A/T content, have the potential to form secondary structures,
and contain clusters of flexible sequences (flexibility clusters). These are all features
that may affect DNA replication and chromatin condensation, suggesting a common basis
for fragility (presence of repeat sequences) that would characterize all fragile sites
(both common and rare).

Previously, evolutionary studies involving fragile sites have attempted to address
two important questions. First, because fragile sites are considered part of the chromosome
structure, are the characteristics underlying their susceptibility to breakage conserved
during evolution? Also, can fragile sites be considered 'targets' for evolutionary
reorganization? In terms of the first question recent studies have shown that some
human common fragile sites have been conserved in homologous regions in mouse and
some primate species [29,56,57], suggesting that the characteristics governing a chromatid's susceptibility to breakage
are conserved during evolution. The high degree of correspondence between the location
of fragile sites and evolutionary breakpoints shown by our study has a bearing on
the second question posed above, namely whether fragile sites are 'targets' for evolutionary
reorganization. Comparative cytogenetic studies performed in primate families such
as Hominidae, Cebidae, and Cercopithecidae [7,16,58-60] revealed that a high proportion of chromosomal bands implicated in evolutionary reorganization,
centromeric shifts, and delimiting heterochromatic regions also contain fragile sites
in the human genome. By increasing the number of species analyzed (mouse, rat, cattle,
dog, pig, cat, horse, and chicken), as well as improving the resolution of evolutionary
breakpoints using whole-genome comparisons, we have been able to draw more precise
conclusions on the distribution of evolutionary breakpoints and their correspondence
to human bands that are known to contain fragile sites. Our data show that fragile
sites appear to be conserved as 'fragile chromosomal bands', in which evolutionary
breakpoints accumulate in much the same way that human fragile sites may be considered
to signal regions of chromosomal instability observed in cancer cells [61].

Repetitive DNA, fragile sites and chromosomal evolution

Given the 'hot spot' theory, one may question whether repetitive elements are driving
chromosomal evolution by triggering reorganization in these regions (for instance,
see the reports by Armengol [42] and Cáceres [62] and their coworkers) or, alternatively, that the repeats accumulate preferentially
in these regions following reorganization. That our study shows that rare fragile
sites in particular have a highly significant association (P = 0.01) with both evolutionary breakpoints and tandem repeats has important implications
for the role of this particular type of fragile site in chromosomal instability, and
hence genome evolution. The molecular characterization of chromosomal regions implicated
in evolutionary breakpoints in human, mouse, and primate genomes has similarly shown
that large-scale reorganization tends to occur at, or close to, regions rich in segmental
duplications and some type of simple tandem repeat (for example, the dinucleotide
[TA]n) [41,63-65].

The analysis of the distribution of tandem repeats in human chromosomes and their
spatial relationship to evolutionary breakpoints presented here highlights two important
points. First, it emphasizes the high concentration of base pair repeats found at
the telomeres and the pericentromeric areas (which is in agreement with previous reports
on the distribution of duplicated regions; see Murphy and coworkers [5]), and the distribution of polymorphic minisatellites [49] throughout the human genome. The second, possibly more remarkable finding is the
concentration of tandem repeats at evolutionary chromosomal bands. Although this is
by no means ubiquitous, the correspondence is typified by human chromosome 3 (Table
2 and 1). Bands with the greatest number of tandem repeats in this chromosome (3p25, 3p21.3,
3p12, 3q13.1, 3q21, and 3q29) are also chromosomal regions that have been implicated
in evolutionary rearrangements. It is noteworthy that the chromosomal bands 3p25,
3p21, 3p12, and 3q21 have previously been identified as breakpoints in primate evolution
[66], and that the evolutionary breakpoints at 3p25.1, 3p12.3, and 3q21.3 are associated
with duplications in hominid evolution [67-69].

In particular, human chromosome 7 (Figure 2a) is interesting both from the evolutionary as well as clinical perspective. Our analysis
shows that there are six bands on this chromosome that contain the greatest concentration
of tandem repeats in the human genome: 7p22, 7p13, 7p11, 7q11, 7q22, and 7q36. All
six bands incorporate fragile sites (FRA7B, FRA7D, FRA7A, FRA7J, FRA7F, and FRA7I)
and all but one of them (7p13) correspond to regions where evolutionary breakpoints
tend to concentrate, as indicated by comparisons of the human genome with those of
mouse, rat, cattle, pig, dog, cat, chicken (present study), and different primate
species [8]. Three of these chromosomal bands (7p22, 7q11, and 7q22) appear to be the boundaries
for mammalian ancestral chromosomes 7a and 7b (Figure 2a) and have been implicated in almost all mammalian species studied to date by comparative
chromosome painting using human painting probes [8,54]. A recent study of the evolutionary history of human chromosome 7 [70] demonstrated that this chromosome may be derived from the orangutan homolog by two
inversions (one paracentric and another pericentric) that involved three chromosomal
breakpoints that map to 7p22.1, 7q11.23, and 7q22.1 in human (one of these, 7q22.1,
is common to both rearrangements). All three bands have the greatest number of tandem
repeats (present study) and are particularly rich in segmental duplications [40]. Moreover, they are considered 'hot spots' for human diseases such as the Williams-Beuren
syndrome [71,72] and leukemias [73].

Figure 2. Multispecies alignments to human chromosomes. Shown are the multispecies alignments
of human chromosomes (a) 7 and (b) 12. Red bars represent the homology synteny blocks (HSBs) in the seven eutherian species
that were compared in our analyses. Black bars represent the regions of homology to
the chicken genome [74]. Segments 7a (red), 7b (blue), 12a (green), and 12b (pink)
are portions of human chromosomes comprising the ancestral eutherian karyotype [8,22].
The location of human fragile sites (FRA) is given to the left of the ideogram of
each human chromosome: red for common fragile sites and blue for rare fragile sites.
The distribution of tandem repeats along these human chromosomes is shown on the extreme
right. The high concentrations of tandem repeats in the pericentromeric and telomeric
areas of each chromosome have been the omitted to provide clearer resolution of the
distribution of repeats elsewhere on the chromosomes (see text for details).

Other notable associations between tandem repeats, fragile site location, and evolutionary
breakpoints include the greatest concentration of tandem repeats found in the human
genome - those in bands 12q13.1 and 12q24. The band 12q13.1 contains one fragile site
(FRA12A) and two evolutionary breakpoints, whereas 12q24 contains three fragile sites
(FRA12C, FRA12D, and FRA12E) and seven evolutionary breakpoints (Figure 2b). Human chromosome 12 is considered to be the result of the fusion of two ancestral
chromosomal segments 12a and 12b (Figure 2b) that are thought to have occurred in the Simiiformes (Catarrhini and Platyrrhini)
ancestor. Chromosomal band 12q24 forms the boundary of these segments [8], once again highlighting a chromosomal region that is characterized both by its fragility
and involvement in evolutionary change.

Conclusion

Our results provide clear evidence of the existence of chromosomal regions in the
human genome that have been repeatedly used in the evolutionary process, thus confirming
and extending earlier observations [2,5,8]. As a consequence, the human genome can be considered a mosaic comprising regions
of fragility that are prone to reorganization that have been conserved in different
lineages during the evolutionary process, and regions that do not exhibit the same
levels of evolutionary plasticity. Although we cannot unequivocally suggest a mechanistic
role for tandem repeats and fragile sites in sculpting modern genomes, our data will
serve to focus further detailed investigations on this fundamental aspect of genome
evolution.

Materials and methods

Whole-genome comparisons and breakpoints analysis

The Ensembl genome browser of Sanger Center and EMBL [74] as well as published data [5] were used as sources for determining homologies between the human genome and those
of the mouse, rat, cattle, pig, dog, cat, horse, and chicken. We used the sequence
coordinates described by Murphy and coworkers [5] to delimit homologous synteny blocks (HSBs), where the data from cattle, pig, cat,
and horse are based on RH maps; the homologous regions between human, rat, and mouse
are based on whole-genome assemblies. To determine syntenic regions between the human
genome (NCBI Build 35) and that of the dog and chicken, we used the completed human/chicken
(WASHUC 1) and human/dog (CanFam 1.0) whole-genome sequence assemblies available from
the Ensembl genome browser. In the case of the dog and chicken we analyzed homologous
syntenic blocks that varied in size between 0.1 Mb and 84 Mb (4), according to the Ensembl genome browser.

For all species analyzed, we follow Murphy and coworkers [5] in viewing an 'evolutionary breakpoint region' as the interval between two syntenic
blocks. As did those authors, we use evolutionary breakpoint regions that are 4 Mb
in size or less in order to avoid problems of low comparative coverage. 'Evolutionary
breakpoints' are defined by sequence coordinates in any of the seven mammalian species
compared with human plus the chicken. They serve to delimit the start and end of each
breakpoint region. Likewise, the limits of each chromosomal band in the human karyotype
can be defined by sequence coordinates using the Ensembl database [74]. Following this procedure, evolutionary breakpoints of each homologous segment were
mapped to the human ideogram at the 550 band resolution, allowing us to identify 'evolutionary
chromosomal bands' (E bands), which are defined as any band in the human ideogram
that contains at least one evolutionary breakpoint in any of the eight species compared
with the human genome (Figure 1). We used the JMP software (version 5.1.2; SAS Institute Inc., Cary, NC, USA) to
investigate the distribution of evolutionary breakpoints.

Fragile site analysis

The data reported by Schwartz and coworkers [39] were used as reference for the location, classification, and number of fragile sites
described in the human genome. Human fragile sites may be classified into two groups
based on frequency of occurrence and mechanisms of expression, and are generally referred
to as either common or rare fragile sites [17]. In this investigation we considered a total of 119 fragile sites [39], of which 88 are defined as common and 31 as rare fragile sites (Additional data
file 7). These were mapped to specific chromosomal bands on the human ideogram at
the 550 band resolution (Table 1). The evolutionary chromosome breakpoint boundaries, each identified by human reference
coordinates (see above), were similarly treated in order to determine whether these
fell within a specific chromosomal band region that is known to express fragility.
It is important to note that in some cases a chromosomal band described as containing
a fragile site in the literature can, at higher resolution (for example, the 550 band
ideogram), be shown to comprise several sub-bands. For example, the common fragile
site FRAJ is mapped to 7q11, which corresponds to four sub-bands in the 550 band ideogram
(7q11.1, 7q11.21, 7q11.22, and 7q11.23).

We defined the chromosomal location of 12 autosomal common fragile sites that have
been characterized at the molecular level by the position provided by the Ensembl
[74] and NIH databases [75] for the molecular markers and/or the BAC clones described in the original papers
(8). These fragile sites were examined to determine whether any evolutionary breakpoint
spanned these regions in at least one of the species compared herein.

Tandem repeat analysis

We analyzed the distribution of tandem repeats in the human genome sequence (NCBI
Build 35) using the 'Tandem Repeats Finder' (TRF) algorithm (version 3.21 [76]) in all human chromosomes (HSA) except HSA X and HSA Y. The complete sequences of
each chromosome were scanned for tandem repeats using the program TRF with the parameters
established by default (+2 -7 -7 0.80 0.10 50 500).

We scrutinized each chromosome's complete sequence using moving non-overlapping windows
of 0.250 Mb in order to analyze the density and distribution of tandem repeats in
the human genome. Given the high incidence of repeats at the telomeres/subtelomeric
and the centromeric/pericentromeric areas [49] (confirmed by our study; Additional data file 9), we excluded a 3 Mb section at each
of these localities, which are referred to herein as the T (telomeric) and C (centromeric)
regions. A further classification involves chromosomal bands that contain evolutionary
breakpoints in at least one of the eight species compared with the human genome (E
bands); all remaining bands were designated as B bands (for example, non-evolutionary
chromosomal bands). Additionally, the presence/absence of a fragile site (rare or
common) was recorded for each chromosomal band based on their published location [39], as defined in the human ideogram at the 550 band resolution (Additional data file
9). Tukey-Kramer tests were used (JMP package version 5.1.2; SAS Institute Inc.) to
evaluate whether tandem repeats concentrate significantly (P ≤ 0.05) in evolutionary chromosomal bands (E bands) and/or fragile sites (FS bands).
In both cases, the centromeric and telomeric regions were excluded before statistical
analysis because they had much higher repeat values overall.

Additional data files

The following additional data are available with the online version of this paper.
Additional data file 1 is a figure showing the multispecies alignments of all human chromosomes. Additional
data file 2 is a figure showing the distribution of base pair tandem repeats along all human
chromosomes represented as windows of 250,000 bp each. Additional data file 3 is a figure showing base pairs implicated in tandem repeats per chromosome. Additional
data file 4 is a table listing all of the homologous syntenic blocks (HSB) detected. Additional
data file 5 is a table listing evolutionary breakpoint regions (EBR) less than 4 Mb and their
chromosomal positions in the human genome. Additional data file 6 is a table listing the evolutionary chromosomal bands detected. Additional data file
7 is a table listing all human fragile sites described in the literature. Additional
data file 8 is a table listing common human fragile sites that have been cloned and analyzed
at the molecular level. Additional data file 9 is a table showing the human genome divided into windows of 0.250 Mb.

Acknowledgements

Financial support to TJR (GUN 2053812) from the National Research Foundation, South
Africa is gratefully acknowledged. ARH is a postdoctoral fellow in the Evolutionary
Genomics Group, and is supported by grants from the University of Stellenbosch and
the Spanish Ministry of Education and Science (MEC). We thank Drs L Froenicke and
M Garcia Caldès, and two anonymous reviewers for providing insightful comments on
an earlier version of this paper.