The big picture

We can now synthesize relatively long stretches of DNA

There are techniques for large-scale re-arrangement of existing chromosomes

This makes it possible to consider synthesizing chromosomes from scratch, or building new chromosomes by rearranging existing ones. For example, chromosomes I, III and VI in S. cerevisiae are < 350kb long, and are candidates for being synthesized from scratch.

This leads to the $64,000 question: If you were to rebuild a yeast chromosome, what changes would you make to it ? or If you were to build a new yeast chromosome, what would you put on it ?

If you're interested in understanding large-scale chromosome structure and its effects, there are (at least) a couple of different overarching goals that can drive the answer to this question:

The Science goal: Investigate how chromosomes are currently organized, and the importance of various elements of their organization.

The Engineering goal: Investigate how to build a chromosome with a particular set of capabilities independent of the actual genes on the chromosome, like a low overall recombination rate.

Gene order and distribution

Genes involved in the same metabolic pathway (as defined by KEGG) tend to "cluster" on chromosomes, where "cluster" means "large region of chromosome with high concentration of pathway members, although non-members may also be present". 20% of metabolic pathways in S.cerevisiae exhibit this kind of clustering, after controlling for tandem duplicates (10% show clustering in random data; percentage for S.cerevisiae is lowest number of all organisms analyzed). (Lee and Sonnhammer, '03; see also erratum correcting major error for S.cerevisiae data.)

Genes that are controlled by the same sequence-specific transcription factor tend to be regularly spaced along chromosome arms. Different periods are observed for different chromosome arms. Regularities are consistent with a genome-wide loop model of chromosomes, in which co-regulated genes dynamically co-localize in 3D. (Kepes, '03)

Adjacent pairs of genes show correlated expression independent of their orientation. Correlated triplets, but not quadruplets, were also found more often than expected by chance. Correlation maps also revealed regularly-spaced groups of correlated genes along chromosomes that might be indicative of higher-order chromosome structure. (Cohen et al, '00)

Statistically significant fraction of genes coding for subunits of stable complexes are located within 10-30kb of each other. This clustering may ensure better coregulation and maintain the right stoichiometry of complexes upon duplication of chromosomal segments (Teichmann and Veitia, '04)

Gene orientation (ie whether they’re on the plus or minus strand) can be modeled by a first-order Markov model ie the orientation of a gene depends on the orientation of the gene that precedes it. (Note: Transition probabilities for yeast are pretty close to 0.5 ie close to a random coin-flipping model, but the authors claim that the coin-flip model is statistically improbable; I can’t really judge their statistics, but I still don’t put much trust into this model.) (Simons and Morton, '03)

Essential genes in yeast are clustered, independent of co-expression and tandem duplication. Clusters of essential genes are in regions of low recombination and larger clusters have lower recombination rates. (Pal and Hurst, '03)

There is negative correlation between chromosome length and G+C content at (silent) third codon positions (GC3s) of ORFS. Chromosome III is abnormal in that it has strong clustering of GC3s; could be because it contains mating-type loci, so there’s selective pressure to keep mating-type switching an intrachromosomal reaction and thus to keep most of the chromosome (between HML and HMR) intact, leading to less structural disruption than other chromosomes (which preserves existing clusters ?) (Bradnam et al, '99)

Mismatch repair

Efficiency of DNA mismatch repair of frameshift mutations in microsatellite repeats varies depending on genomic position of microsatellite repeat. There doesn't seem to be any correlation between repair efficiency and position with respect to replication origin, replication timing, or G:C content of nearby sequence. Authors suggest that context-dependence of repair efficiency reflects some aspect of chromatin structure. (Hawk et al, '05)

Not clear how applicable their findings are to non-repeating sequence, like protein-coding regions.

Might be interesting to take the map of chromatin structure produced by Pokholok et al, '05 and see whether there is correlation between chromatin structure and efficiency of mismatch repair, as proposed.

Recombination frequency

There are hot- and coldspots of meiotic recombination in S.cerevisiae. Each chromosome has hotspots & coldspots; hotspots tend to cluster around regions with high G+C content whereas coldspots are nonrandomly associated with centromeres and telomeres. Hotspots are also enriched near genes involved in metabolic pathways and ionic homeostasis; coldspots were over-represented near ORFs involved in transport facilitation and intracellular transport. Some types of hotspots require transcription factor binding in order to become active. Hotspots tend to be in intergenic regions. (Gerton et al, '00)

Transcription factor binding sites

Lots of high-scoring transcription factor binding sites in ORFs, some of which are actually bound to in vivo (but with lower average binding strength than sites in intergenic regions). (My 7.90 class project)

Chromosome replication

Autonomous replication sequences (ARS), are about 200 bp long and contain an ARS consensus sequence (ACS) that's ~11bp long. Sequence flanking the ACS is essential, but there are no obvious sequence similarities between flanking sequences in different ARSs.

There are 200-400 ARSs in the yeast genome (ie they occur every 30-40 kbp), but not all function as origins of replication.

For given cell type, under given growth condition, each part of the genome replicates at a characteristic time within S phase.

Activation timing of each origin is related to its chromosomal position; origins near centromeres are activated earlier, origins near telomeres are activated later than other origins.

No correlation between steady-state transcription level of a gene and establishment/activation of an origin near the gene.

Pre-RC (complex that assembles at origins before replication) primarily assembles at pro-ARSs (ie possible origins of replication) in intergenic regions. However, significantly fewer pro-ARSs occur in intergenic sequences flanked by diverging transcripts than would be expected.

Bottom line:

Still can't predict which ARSs will actually function as origins of replication, and the timing of their activation.

Mechanism responsible for establishing the conserved, characteristic pattern of replication across the genome is still unknown.

Promoter regions and transcriptional start sites of active genes are enriched in acetylated histones. Active genes are also enriched for methylated histones, both at their beginning and further downstream. (Pokholok et al, '05)

Some chromatin background

Nucleosome remodeling complexes can be targeted to DNA by interaction with DNA-bound transcription factors. Alternatively, binding of some TF to DNA is incompatible with association of the same DNA with a histone octamer. Since nucleosomes require >147bp of DNA to form, if 2 such TF bind < 147bp apart, the DNA between them can't assemble into a nucleosome.

Modification of N-terminal tails of histones alters chromatin accessibility. Acetylated nucleosomes are typically associated with transcriptionally active nucleosomes, deacetylated nucleosomes with transcriptionally inactive chromatin. Methylation can have either effect, depending on particular amino acid that is methylated. There is one known demethylases [Shi,Y et al '04].

Proteins with bromodomains interact with acetylated histone tails, proteins with chromodomains with methylated histone tails. Bromo/chromodomain-containing proteins are often associated with acetylases/methylases and can thus participate in a positive feedback loop.

During DNA replication, H3:H4 tetramers are either transferred wholesale to the new strand or retained on the old strand. H2A:H2B dimers are released into soluble pool and then reassociate with the old and new strands. Nucleosome assembly requires chaperones.

Initial ideas

"Science" ideas

Build rearranged/jumbled chromosomes: preserve functional elements (eg gene + associated promoter), but change gene order, orientation & strand. Could be done via some combination of the fragmentation, rearrangement and fusion techniques described in the papers below. Profile gene expression, histone location, replication origin activity etc, and use the data as input to data mining algorithm that tries to find correlations between various aspects of chromosome structure and whatever was profiled. Idea is that re-arranged chromosomes give you a larger data set to mine and allows you to make stronger conclusions than data from just the WT chromosomes.

Generating specific rearrangements: there is an algorithm for calculating minimal sequence of inversions, translocations etc needed to transform one permutation (ie ordering) of genes into another, by Pavel Pevzner's group at UCSD. Extending this algorithm to take into account practical issues that would arise when trying to rearrange a chromosome, like having to make sure that you don't remove any essential genes during a rearrangement step, might be an interesting engineering problem.

Disrupt all occurrences of TF binding sequences that occur in coding sequence (by disrupting the motif but keeping the same amino acid sequence) and then profile gene expression patterns. Would help to determine whether in-gene binding sites are biologically relevant, eg by acting as “titrating” sites (along TK’s theory).

Specifically: pick a transcription factor that has a well-known, unique overexpression phenotype and disrupt all of its intragenic binding sites. If these binding sites acted to titrate the TF away from the “real” binding sites, then you should see the same phenotype as when the TF is overexpressed

Remove all “inert” DNA ie non-coding, not promoters etc; see whether yeast is still alive.

Remove all ORFs of unknown/duplicated function, see whether yeast is alive or not.

Remove all introns

Not sure what this would really tell us. Would it make yeast easier to manipulate ?

Change all codon usage to be “optimal” (if there are some non-optimal codons) & see whether fitness (by some measure of fitness) improves

Problem is that you’d (probably) have to do this across all chromosomes, not just a single chromosome, in order to see an effect on fitness

Put all genes in pheromone response pathway on their own chromosome & remove the endogenous copies, to test current model of pathway

Problem is that there are >50 genes involved in the pathway and removing all the endogenous copies would be a lot of work, and it's not even clear whether the yeast would still be alive.

But, might be able to use an orthogonal transcription factor (leaving native copies under Ste12 control, in a Ste12 del strain)

Rebuild chromosome by moving promoters + ORFs associated with recombination hotspots around, re-profile recombination hotspots and see whether they’ve moved with the promoters/ORFs.

Engineer photosynthesis into yeasts.

Not clear what the point of doing so would be, other than “because we can”.

Explore yeast mating-type switching, since all genes involved are on chromosome III

Not appealing because lots of experiments with chromosome architecture and location of the loci have already been done; also, behavior doesn’t seem to be sequence-specific, with exception of the RE element. See also Galgoczy et al, which seems like a pretty thorough, low-level dissection of mating-type switching.

"Engineering" ideas

Design chromosome that:

Is resistant to disruption by Ty1: try to design “Super Ty1” transposon (similar to what Han and Boeke did with human LINE1 transposons) that disrupts WT chromosomes a lot and then design chromosomes that are resistant to being invaded by this Super Ty1.

Undergoes meiotic recombination only rarely

Gets replicated very quickly/slowly; not sure why you'd want that, though, other than as a way of being able to shorten/lengthen S phase.

Has custom chromatin structure eg

doesn’t have any closed regions of chromatin

has chromatin structure that varies with, say, cell cycle

has uniform chromatin structure, so that differences in gene expression are determined only by promoter sequence & levels of TF and thus (in theory, at least) easier to model.

has nucleosomes made up only of custom histones that don’t respond in a standard way to the usual acetylation/methylation events, or have a custom histone code to allow extended programming of chromatin structure.

Caveat: unclear to what extent you can really control chromatin structure via DNA sequence.