Abstract

Genome instability is associated with mitotic errors and cancer. This phenomenon can lead to deleterious rearrangements, but also genetic novelty, and many questions regarding its genesis, fate and evolutionary role remain unanswered. Here, we describe extreme chromosomal restructuring during genome elimination, a process resulting from hybridization of Arabidopsis plants expressing different centromere histones H3. Shattered chromosomes are formed from the genome of the haploid inducer, consistent with genomic catastrophes affecting a single, laggard chromosome compartmentalized within a micronucleus. Analysis of breakpoint junctions implicates breaks followed by repair through non-homologous end joining (NHEJ) or stalled fork repair. Furthermore, mutation of required NHEJ factor DNA Ligase 4 results in enhanced haploid recovery. Lastly, heritability and stability of a rearranged chromosome suggest a potential for enduring genomic novelty. These findings provide a tractable, natural system towards investigating the causes and mechanisms of complex genomic rearrangements similar to those associated with several human disorders.

eLife digest

The genome of an individual organism contains all the instructions needed to build and maintain that individual. Any changes to the DNA in the genome can alter the instructions that are given to cells, which can lead to cancer and other diseases. However, changes to the genome can sometimes be beneficial as they can introduce more variety into the instructions carried by different individuals, which increases their potential to adapt to changes in their environment.

In plants and animals, DNA is arranged into structures called chromosomes. Generally, an individual's genome contains two copies of each chromosome; one inherited from their mother and one from their father. However, occasionally during reproduction, all the chromosomes from one of the parents are left out from the cells of the offspring in a process called ‘genome elimination’. This makes individuals that carry only half the normal number of chromosomes, known as haploids. Sometimes the process of genome elimination is disrupted, which leads to individuals that have incomplete genomes or chromosomes that carry big rearrangements of the DNA, as if they had been shattered and put back together incorrectly.

In a small plant known as Arabidopsis thaliana, genome elimination frequently happens in the offspring of two individuals that carry different versions of a gene called centromeric histone H3 (CENH3). However, it is not clear how this works, or what roles genome elimination plays in evolution and disease.

Here, Tan et al. studied genome elimination by cross-breeding Arabidopsis plants that carried a mutant form of CENH3 with plants that have a normal version of the protein. The experiments found that many of the offspring were haploid. Some of the others carried an extra copy of an entire chromosome or a section of a chromosome. A third group had an extra copy of a chromosome that was missing some sections or had been rearranged. These ‘shattered’ chromosomes were always formed from chromosomes that came from the parent plant with a mutant form of CENH3.

Tan et al. also found that a protein called DNA Ligase 4, which helps reconnect broken DNA strands, is involved in repairing the breaks in these shattered chromosomes. Some of the genetic rearrangements documented in the experiments were passed on to subsequent generations of plants, which suggests that these genomic changes can be stable enough to be inherited.

The genomic rearrangements observed in the Arabidopsis plants are similar to those seen in patients with cancer and other genetic diseases. Tan et al. findings show that Arabidopsis plants provide a useful system for studying these genome rearrangements, which may inform efforts to treat these human diseases.

Results

We used the GFP-tailswap haploid inducer (Ravi and Chan, 2010; Ravi et al., 2014) in the experimental setup illustrated in Figure 1. This strain is in the Col-0 background and carries a homozygous CENH3 null mutation whose function is partially complemented by a chimeric CENH3 in which an N-terminal GFP fused to the H3.3-like N-terminal tail replaces the native CENH3 N-terminal tail. We crossed this strain to polymorphic accession Ler gl1-1 to track haplotypes in the F1 progeny and obtained the expected haploid induction frequency (Ravi and Chan, 2010; Ravi et al., 2014) (Figure 1). The recessive gl1-1 mutation confers trichomeless leaves in paternal Ler gl1-1 haploids while it is masked in Col/Ler diploid hybrids. We sequenced 10 of the phenotypically diploid Col/Ler individuals with wild-type phenotype, performed dosage plot and single nucleotide polymorphism (SNP) analysis and found that 100% of these were diploid with 50% Col and Ler genomes respectively (Figure 1—figure supplement 1). Plants from the aneuploid class exhibited multiple pleiotropic and morphological defects and had trichomes, except in the rare exception when the GL1 locus was lost. The five recognizable primary trisomic (2n + 1) phenotypes were represented (Steinitz-Sears, 1963; Koornneef and Vanderveen, 1983): Chromosome 1 (Chr1) trisomics have dark green, serrated leaves and are dwarfed, Chr2 trisomics exhibit round leaves and are late flowering, Chr3 trisomics have narrow, yellow green leaves, Chr4 trisomics display narrow and smaller flat leaves, and Chr5 trisomics display light green and narrow leaves. However, aneuploid plants with more severe or unusual phenotypes were also observed, suggestive of other chromosomal combinations or more serious chromosomal aberrations. Chromosome dosage analysis based on whole genome sequencing (Henry et al., 2010) (Supplementary file 1) distinguished three chromosomal alteration types in aneuploids (Figure 2). Similar outcomes were obtained using independently derived haploid inducers, either expressing GFP-tailswap (Figure 2E and Figure 2—figure supplement 1) or CENH3 from other plant species (Maheshwari et al., 2015). The most common type, numerical aneuploids, display whole chromosome aneuploidy such as in the classical primary trisomics (Figure 2B shows an example for a numerical Chr3). In our dataset, single primary trisomics (2n + 1) account for 75% of the numerical class. Other individuals from the numerical class with two or more extra whole chromosomes included 16% double primary trisomics (2n + 1 + 1), 2% triple primary trisomics (2n + 1 + 1 + 1) and 3% quadruple primary trisomics (2n + 1 + 1 + 1 + 1). Additionally, we obtained disomic Chr4 haploids (n + 1, a type of numerical aneuploidy that were not included in this analysis) as well as Chr2 or Chr3 monosomic diploids (2n − 1) at 4% frequency (Figure 2—figure supplement 2). These have never been described in Arabidopsis before, possibly because, if they were to arise from meiotic defects, they would result from nullisomic gametes, which are not viable (Henry et al., 2009). Aneuploids resulting from mitotic failure do not have those constraints.

The altered CENH3 ‘GFP-tailswap’ strain was hybridized to the recessive glabrous1-1 mutant. Mean percentages of haploid, diploid and aneuploid progeny obtained from crosses to three independent GFP-tailswap lines are indicated, as determined after phenotypic characterization. Individuals belonging to the aneuploid class were sequenced and subjected to chromosome dosage and single nucleotide polymorphism (SNP) analysis as indicated by the arrow.

Characterization of the three distinct aneuploid types from GFP-tailswap haploid induction crosses.

(A–D) Dosage plots from all five Arabidopsis chromosomes in consecutive non-overlapping 100 kbp bins and the corresponding SNP plot for the % haploid inducer genome (Col-0) present in each sample. A diploid Col/Ler hybrid (A), an individual with primary Chr3 trisomy from the numerical aneuploid class (B), an individual with a truncated trisomic Chr3 (C) and an individual with shattered Chr3 (D) are shown here. Centromere positions are indicated by red diamonds. (E) Percentages of the different aneuploid types obtained from three different GFP-tailswap haploid inducer lines. (F) For each chromosome, the percentage of aneuploid individuals exhibiting altered dosage for that particular chromosome is plotted. All aneuploids characterized in this study are included. Chr4 is overrepresented (**Student's t-test, p < 0.01) while Chr5 is underrepresented (* Student's t-test, p < 0.05).

The second alteration type is defined by simple truncations and repair of at most two double stranded DNA breaks per chromosome (Figure 2C shows an example of truncated Chr3). This truncated class was found to occur in 22% of the aneuploid population. In the third class, a single chromosome exhibited many oscillations in copy number state, as if shattered and subsequently rearranged (Figure 2D shows an example of shattered Chr3). This shattered class was found to occur in 11% of the aneuploid population. Additionally, some of the aneuploids exhibited a combination of numerical, truncated and shattered chromosome types (Figure 2E). Alteration of copy number for Chr1, 2 and 3 are represented at similar frequencies based on the average copy number alteration of all five chromosomes, with Chr4 and 5 alterations being, respectively, over- and under-represented (Figure 2F). This may be explained by the uneven distribution between chromosomes of few, selected genes that are highly dosage-sensitive. According to this hypothesis, Chr4 would be selectively depleted for such genes.

Chromosomal truncations have been reported from a selfed trisomic (Huettel et al., 2008). To assess whether truncated and shattered aneuploid types could be produced from meiotic missegregation, we sequenced 96 individuals produced by a selfed Col-0 triploid. Because of the irregular meiosis, most gametes produced by triploids are aneuploid (Henry et al., 2007). Dosage analysis revealed that all were numerical aneuploids (Supplementary file 2 and Figure 2—figure supplement 3). To assess whether truncation and shattering could be the result of meiotic defects in the GFP-tailswap line, we sequenced 96 individuals from selfed GFP-tailswap and observed that 98% (n = 94) of the progeny were diploid while two individuals carried single primary trisomies of Chr2 and Chr3 respectively, representing only the numerical class of aneuploids (Supplementary file 3 and Figure 2—figure supplement 4). Based on these results, we believe that truncated and shattered aneuploid classes from our crosses reflect genomic instability associated with mitotic errors in the early embryo.

Shattered chromosomes can be recovered from all five A. thaliana chromosomes (Figure 3A). In some cases, shattering appears to extend to two chromosomes (top panel of Figure 3A) only because the haploid inducer used carries a reciprocal Chr1/Chr4 translocation originating from the integration of GFP-tailswap T-DNAs. SNP analysis demonstrates that all duplicated (copy number 3) and triplicated (copy number 4) regions originated from the haploid inducer (Figure 3B). Single-copy regions displaying loss of heterozygosity carry Ler alleles (i.e., wild-type), consistent with the loss of the haploid inducer haplotype.

Shattered chromosomes are confined to a single chromosome originating from the haploid inducer.

(A) Chromosome dosage plots based on non-overlapping 25 kbp bins across each chromosome for five aneuploid individuals with shattered chromosomes. The GFP-tailswap transgene insertion event that resulted in a reciprocal translocation between Chr1 and Chr4 in one of the haploid inducer parent (GFP-tailswap #11) is indicated with black arrowheads. The translocation is only visible in individuals for which chromosomes 1 and 4 are not balanced with each other. Duplications (copy number 3), triplications (copy number 4) as well as deletions accompanied with loss of heterozygosity (copy number 1) were observed from dosage plots. (B) Box plots of the percentage of haploid inducer genome present at each copy number state, as determined by the SNP analysis. Mean and standard errors are shown.

Although aneuploids from the shattered class were often sterile, line FRAG00062 was partially fertile and allowed us to investigate the inheritance and stability of the variant DNA. We sequenced 16 F2 progeny from FRAG00062 and obtained two individuals with precisely the same shattered pattern as the F1 parent and 14 that appeared diploid (Figure 4A). Meiotic co-inheritance of all dosage variant segments is consistent with a single, stable chromosomal unit that was formed after a catastrophe. To confirm this hypothesis, we used DNA fluorescence in situ hybridization to visualize the FRAG00062 chromosomes using Col-0 derived BAC painting probes specific for Chr1 and Chr4 (Figure 4B). Mitotic cells contained 11 chromosomes (Figure 4D). FRAG00062 came from a cross using GFP-tailswap line #11, which carried a reciprocal Chr1/4 translocation (Figure 2—figure supplement 1). This allowed us to distinguish the haploid inducer Chr1, the Ler Chr1, and a third Chr1 with rearranged signals, which we interpret as the shattered extranumerary chromosome (Figure 4C and Figure 4—figure supplement 1). During meiotic Metaphase I (Figure 4D) or other meiotic stages observed from male meiocytes (Figure 4—figure supplement 2), the shattered chromosome does not pair with the parental Chr1s.

Stable inheritance and chromosome painting of a shattered aneuploid chromosome.

(A) Dosage analysis from 16 F2 individuals from a selfed FRAG00062 individual. Progeny individuals either inherited the shattered chromosome intact (n = 2) or appeared diploid (n = 14). (B) Cartoon of the different versions of chromosomes 1 and 4 expected to be present in FRAG00062. Chromosome painting probes and corresponding chromosome positions used for (C) and (D) are shown. Black triangles indicate the position of the reciprocal Chr1/Chr4 translocation present in the haploid inducer line, whereas black circles indicate centromere positions. (C) A mitotic cell from FRAG00062 with 11 chromosomes, including four painted chromosomes. Scale bar = 5 μm. (D) The shattered Chr1 from FRAG00062 remains unpaired at meiosis as shown here at Metaphase I. Enlargements of the shattered Chr1 and paired Chr1 are shown on the right. Scale bar = 5 μm. (E) Nuclei from a two-cell stage embryo from a wild-type cross (left panel) and from an embryo undergoing uniparental genome elimination (right panel). Nuclei are visualized using CFP-tagged histone H2B from the pollen parent superimposed with an image of the embryo visualized under light microscopy. Note the presence of micronuclei from the embryo undergoing genome elimination (right panel). Scale bar = 5 μm. (F) Percentage of micronuclei observed in wild-type crosses and genome elimination crosses. The different percentages of micronuclei per cell are indicated.

Next, we sought to investigate why shattering is restricted to a single chromosome. During genome elimination crosses in other plant species, micronuclei are commonly observed (Subrahmanyam and Kasha, 1973; Gernand et al., 2005). We dissected embryos from a genome elimination cross and observed one to four micronuclei per cell (Figure 4E, F) in 81% of the embryos (n = 110), but none in embryos from control crosses (n = 21, p < 0.001). The presence of micronuclei suggests that sub-compartmentalized lagging chromosomes can be shattered by double stranded DNA breaks, reassembled haphazardly by non-homologous end joining (NHEJ), and finally restituted into the main nucleus (Crasta et al., 2012).

In order to reconstruct breakpoint junctions, we sequenced FRAG00062 to 100× coverage, extracted read pairs from the ends of duplicated and triplicated blocks and performed de novo assembly. 38 such junctions were assembled (Supplementary file 4) and a random subset of 12/12 were confirmed by PCR (Figure 5—figure supplement 1) followed by Sanger sequencing to demonstrate the accuracy of the de novo assembly. All reconstructed junctions were consistent with NHEJ with either microhomology, observed as 2–15 bp of sequence overlap (Hastings et al., 2009), blunt fusions, or unidentified sequence insertions (Figure 5B). We also observed inversions (fragments that join in head to head or tail to tail orientation) in 47% of our breakpoint junctions (Supplementary file 4). The size distributions of microhomology tracts and insertions are indicated in Figure 5—figure supplement 2.

Overall, triplicated block sizes from FRAG00062 were significantly smaller than duplicated blocks (n = 23 in both cases, with p < 0.001, Figure 5C) and these triplications cannot be easily explained from a missegregated chromosome. Duplicated and triplicated blocks could therefore, have different origins. To address this question, we asked whether breakpoint junctions of the two different copy number states display differential association to various genomic and chromatin features such as genes and repeated elements (Lamesch et al., 2012), DNA replication origins (Costas et al., 2011), DNase I hypersensitive sites (DHS) (Zhang et al., 2012) and nine non-overlapping chromatin states that partition the Arabidopsis genome (Sequeira-Mendes et al., 2014) (Supplementary file 5). When analyzing windows of 1000 bp centered around the breakpoints of duplicated blocks, we observed an enrichment in genic DNA (from 53% background level to 70%, p < 0.01, Figure 5D,F). A subtler, but still significant, increase was observed when using larger windows (10,000 bp , from 53% background level to 62%, p < 0.01, Figure 5F). Consistently, 42% of breakpoint junctions from FRAG00062 are predicted to generate chimeric gene products (Supplementary file 4). In the same analysis, we noted that the breakpoint regions of duplicated and triplicated blocks contained some genomic features that differed in frequency. In particular, replication origins, which occupy less than 1% of 10,000 bp windows around the borders of duplicated blocks, are present in almost 8% around the borders of triplicated blocks (compared to a genome average of 3.5%, p < 0.05, Figure 5E,G). The association of the breakpoints flanking duplicated DNA to genic DNA and of those flanking triplicated DNA to replication origins suggests the contribution of two distinct mechanisms to restructuring of the same chromosome (Figure 6). The first, chromothripsis acting through breakage and ligation (Stephens et al., 2011; Korbel and Campbell, 2013). The second, chromoanasynthesis, via replication fork collapse and template switching (Hastings et al., 2009; Liu et al., 2011; Kloosterman and Cuppen, 2013).

The process of genome elimination and connected models for chromosomal rearrangements.

Genome elimination ensues when a haploid inducer expressing a variant CENH3 protein mates with the wild type. In many cases, the chromosomes marked by the variant CENH3 missegregate in the embryo and are compartmentalized in micronuclei. DNA damage, NHEJ repair and restitution of the micronucleus to the euploid pole nucleus can result in aneuploidy or diploidy. Alternatively, shattered chromosomes result from chromothripsis and chromoanasynthesis. The former involves fragmentation and random ligation, the latter replication fork collapse and microhomology-mediated strand switching. As a consequence, the pulverized and reassembled chromosome forms a single unit and can be meiotically inherited. The schematics for chromothripsis and chromoanasynthesis are shown sequentially for convenience, but their order has not been determined. In addition, our results obtained using DNA ligase4-2 mutants suggest that the NHEJ pathway plays an important role in the repair of the haploid inducer chromosomes that contribute to diploid and aneuploid progeny, such that when NHEJ is inhibited, haploid induction frequency increased.

Our in silico reconstruction suggests that NHEJ is involved in repairing breaks that occurred on the shattered chromosomes. To test this explanation, we created a haploid inducer carrying a homozygous null mutation in LIG4 (DNA Ligase IV), a conserved component of the canonical NHEJ pathway. Pollinating it with wild-type LIG4/LIG4 pollen (from Ler gl1-1) resulted in normal haploid induction frequencies. However, when mutant lig4-2/lig4-2 pollen was used, the frequency of haploids doubled at the expense of aneuploids and diploids (Table 1 and Figure 7). This effect was still observed when the seed parent carried the WT allele (Table 1). It is possible that parental-specific haploinsufficiency results from early loss of the wild-type LIG4 allele located on the chromosome targeted for elimination, which in this case is the maternal chromosome. This result indicates that NHEJ contributes to formation or persistence of aneuploid and diploid progeny and that unrepaired double-stranded DNA breaks increase elimination of the haploid inducer genome, similar to observations in mouse-human hybrid genome elimination (Wang et al., 2014). We hypothesize that missegregated chromosomes enter a degradative pathway initiated by endonucleolytic breaks. Occasionally, such chromosomes are rescued (i.e., restituted to a haploid or diploid nucleus) through a pathway requiring NHEJ, resulting in aneuploidy. Therefore, more haploids are produced when the NHEJ pathway is impaired (Figure 6).

Dosage plots for lig4-2 haploids isolated from a haploid induction cross using diploid lig4-2 as the male donor.

Dosage plots of lig4-2 haploids based on 150 kbp non-overlapping bins across all five Arabidopsis chromosomes. Euploid chromosome dosage plots for lig4-2 haploids have the appearance of having a copy number of 2 only because euploid chromosome dosage was calculated with the value of 2 in this analysis. Centromere positions are indicated by red diamonds.

Discussion

Taken together, our results provide evidence for the occurrence of chromosome restructuring (Cai et al., 2014; Morrison et al., 2014) when diverged individuals hybridize, identifying a centromere-based mechanism for genomic instability. This phenomenon studied here depends on chimeric CENH3, but a similar effect was observed when the haploid inducer strain expresses CENH3 of a close species (Maheshwari et al., 2015), indicating the effectiveness of natural and artificial variation. While the genesis and fate of restructured chromosomes is difficult to study in humans, their formation, effects, and even transmission in Arabidopsis are within experimental reach, as demonstrated by the enhancing effect of NHEJ mutants on haploid induction. The range of phenotypes, the formation of copy variants and of chimeric genes at junctions, and their occasional meiotic transmission, suggest that catastrophic chromosomal restructuring, could contribute to heritable genetic variation.

Materials and methods

Plant material and growth conditions

All plants were grown in Sunshine Professional Mix Peat-Lite Mix 4 (SunGro Horticulture, Agawam, MA) under 16hr/8hr light/dark photoperiod in a growth room set at 21°C. F1 seeds from GFP-tailswap crosses were germinated on MS agar plates and 2-week old seedlings were transplanted into soil. The lig4-2 (SAIL_597_D10) line used is in the Col-0 background. Genotyping primers (5′ to 3′) used to are lig4-2/LP2: GATATGACAAGCCTTGGCATGAATGT, lig4-2/RP: AAAGTGGATGACATCTCGCTG and LB1: GCCTTTTCAGAAATGGATAAATAGCCTTGCTTCC for the left border of the SAIL T-DNA insertion.

Genomic DNA preparation, sequencing and read processing

All DNA samples were extracted from leaves using Nucleon Phytopure kits (GE Healthcare, Pittsburgh, PA). 1.5 μg of DNA were used for a PCR-free library preparation using the NEBNext DNA Library reagents with Nextflex-96 indexes (Bioo Scientific, Austin, TX) using a PCR-free protocol. 2 μl of each 96-barcoded libraries were pooled and sequenced using the 50 bp protocol on a single lane of Hiseq 2000 at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley. Demultiplexing was performed by the same facility and resulting raw reads were processed with a custom Python script (Filter_N_Adapter_Trim_Batchmode.py – available from GitHub repository: https://github.com/KorfLab/FRAG_project) that removes the filtered reads from Cassava 1.8, adapter sequences, reads that contain Ns and trims reads for quality.

Chromosome dosage analysis

For dosage plot analyses, 50 bp single reads were mapped to the TAIR10 A. thaliana reference genome sequence using BWA (Li and Durbin, 2009) and default parameters. Dosage variation was detected as previously described (Henry et al., 2010), and is described in detail at Bio-protocol (Tan et al., 2016). The genomic reference chromosomes were partitioned into consecutive non-overlapping bins of 100,000 bp and the percentage of reads mapping to each bin from each sample was recorded. Relative coverage was calculated by dividing the percentage obtained for each bin by either the corresponding mean percentage for all individuals or the corresponding percentage for the control individual. The relative coverage was set at 2 to represent the diploid background copy value.

SNP analysis

Positions polymorphic between Col-0 and Ler were identified using sequencing reads from a diploid Col/Ler hybrid control, a Ler plant and a Col-0 plant using custom python scripts. Specifically, polymorphic positions were first identified if they were covered at least 25 times in the hybrid reads and contained two alleles, each representing at least 40% of the allelic calls. Reads from the Col-0 and Ler parents were then used to assign alleles to the two parents. Positions were only retained if they were homozygous in both parents (represented at least 97% of the allelic calls) and covered at least 6 times in the Col-0 library and at least once in the Ler library. This process resulted in the identification of 107,640 SNP positions (Supplementary file 6). Next, reads from each of the samples were mined for allele calls at these positions and each read was assigned to one or the other parent based on the parental information. If the read did not match either allele, the genotype was reported as ‘na’. Finally, genotype information was pooled by consecutive, non-overlapping bins of 1 Mb to derive a percentage of Ler allele per bin for each sample. Using this measure, the Col-0/Ler diploid hybrid is expected to exhibit 50% Col-0 across the genome.

Cytogenetic analysis

All analyses were carried out using chromosome spreads from young anthers. BAC contigs specific for A. thaliana chromosomes 1 and 4 were used as painting probes. BAC DNA was labeled with biotin-, digoxigenin- or Cy3- deoxyuridine triphosphate by nick translation as previously described (Lysak and Mandáková, 2013). Labeled DNA probes were pooled, hybridized to suitable chromosome spreads and visualized using fluorescent microscopy. See Supplementary file 7 for the list of BAC clones used as painting probes.

Breakpoint assembly

Breakpoints from FRAG00062 were identified using a high-density 500 bp bin-size dosage plot produced using 50 bp reads extracted from 100 bp paired-end sequencing reads of the FRAG00062 library obtained from an Illumina HiSeq 2000 instrument. Blocks of duplicated or triplicated dosage were defined by eye. A custom script (batch-specific-junction-search.py – available from GitHub repository: https://github.com/KorfLab/FRAG_project) was used to extract the sequencing reads mapping within a 2000 bp region around each breakpoint. These sequences were then assembled using the PRICE genome assembler using the standard paired-end assembly setting (Ruby et al., 2013). Resulting contigs were aligned to the Arabidopsis reference genome by NCBI-BLASTN and characteristic breakpoint junctions were identified when two halves of a contig mapped disconcordantly to the reference genome. Primers flanking 12 randomly selected breakpoint junctions were designed using Primer3 (Li and Durbin, 2009) based on their respective de novo assembled contigs. Standard PCR procedures were used for amplification using oligo pairs (Supplementary file 8) and GoTaq Green Mastermix (Promega Corporation, Madison, WI) on 1 ng DNA from FRAG00062 and FRAG00080 (a diploid sibling control) followed by Sanger sequencing.

Breakpoint analysis

The A. thaliana TAIR10 genome annotation includes genomic locations for various features in Generic Feature Format Version 3 (GFF). Files specifying genes, transposon, satellite repeats, and replication origins were downloaded from the TAIR FTP site (ftp://ftp.arabidopsis.org//Maps/gbrowse_data/TAIR10/). The GFF file containing the location of mapped replication origins was available from a study by Costas et al., (2011). These GFF files were combined with results about mapped DHS (Zhang et al., 2012) and details from the recent work by Sequeira-Mendes et al., (2014), which combined various published epigenomic studies to partition the entire genome into nine different chromatin states. Perl scripts were used to convert the DHS and chromatin state information into GFF format, and these scripts, along with the resulting combined GFF file are available from a GitHub repository: https://github.com/KorfLab/FRAG_project.

The set of genome features in the combined GFF file were compared to the annotated set of duplicated and triplicated blocks. Various Perl scripts available from the above GitHub repository, along with a GFF representation of all blocks were used to assess the enrichment of genomic features at the breakpoint regions of duplicated/triplicated blocks. Specifically, window sizes of either 1000 or 10,000 bp were centered on each breakpoint coordinate, and the number of bp contributed by each feature of interest were summed across all windows. We also calculated the number of bp contributed by each feature outside those windows.

Enrichment ratios were then calculated using the percentage of bases occupied by each feature across all windows at breakpoints compared to percentage of the same features that occupy the remaining fraction of the aneuploid chromosome. The p-values were determined by shuffling experiments in which the locations of the breakpoints were randomized 1000 times, with the resulting shuffled ratios compared to the ratios observed in the real data (Supplementary file 5).

Decision letter

Bernard de Massy

Reviewing Editor; Institute of Human Genetics, CNRS UPR 1142, France

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Catastrophic chromosomal restructuring during genome elimination in plants” for consideration at eLife. Your article has been favorably evaluated by Detlef Weigel (Senior editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.

The following individuals responsible for the peer review of your submission have agreed to reveal their identity: Bernard de Massy (Reviewing editor and peer reviewer) and Jim Haber (peer reviewer). A further reviewer remains anonymous.

The Reviewing editor and the other reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.

This manuscript presents an analysis of chromosome reorganization and elimination in crosses of Arabidopsis plants with different CENH3 variants. Although this particular study principally involves GFP-tailswap haploid-inducer lines, it has clear relevance to general biological phenomena resulting from hybridization of related species and aneuploidy. The authors have used cytogenetics complemented by a NGS approach to carry out a clear and coherent analysis of chromosome elimination and reorganization at the genome-wide level in these plants. This is a most interesting paper that follows up on a recent PLoS Genetics paper that showed that crossing plants with differing CENH3 N-terminal tails frequently resulted in genome instability. The present paper extends these studies and definitively analyzes the nature of the chromosome shattering, showing that the instability most likely arises in post-meiotic mitoses and likely most involves two different processes are implicated.

The data appear to be solid, are presented clearly and support the authors' conclusions. Overall the text is concise, clearly written and referenced. However several aspects need to be clarified. In particular, analyses of the genomic features at breakpoints and the role of lig4.

Major comments:

1) Presentation of the events: It is quite difficult to get a clear picture of the relationship between the different classes of aneuploidy and the genome rearrangements. An improved presentation of the data is necessary: A more precise description of the numerical class should be provided in particular with presentation of frequency of trisomies and monosomy and the phenotypes. The category of diploids (25%) appears to be based only the wild type phenotype? A diploid sample should be analyzed for copy number variation and rearrangements as control.

2) The experiment performed to address the meiotic origin is not convincing: The key experiment is to test progeny of GFP-tailswap. Ravi et al., 2010 indicate that this line is not fully fertile suggesting a meiotic defect. In addition, aneuploids are detected in the progeny of selfed plants. With the current information, the authors cannot exclude a meiotic origin of the truncated and shattered chromosomes identified in aneuploids.

3) Genomic features at breakpoints: The analysis of genomic features is very weak and not convincing (Figure 4D-G). This analysis should be removed or validated by statistically tests. The abstract should be modified accordingly.

4) The effect of Lig4 should be clarified: why such an effect when heterozygous in Ler ecotype? What is the genotype/genomic structure of the haploids? If the absence of NHEJ leads to loss of recovery of shattered chromosomes, why is the proportion of haploids (relative to diploids) increased? How does it fit with the model presented in Figure 5 is unclear.

Author response

1) Presentation of the events: It is quite difficult to get a clear picture of the relationship between the different classes of aneuploidy and the genome rearrangements. An improved presentation of the data is necessary: A more precise description of the numerical class should be provided in particular with presentation of frequency of trisomies and monosomy and the phenotypes.

We have improved the presentation of the different classes of aneuploidy by including an example of each aneuploid type in a main figure (Figure 2), accompanied by a description in the text. The frequencies of the various primary trisomies and monosomies are now included in the text as well.

The category of diploids (25%) appears to be based only the wild type phenotype? A diploid sample should be analyzed for copy number variation and rearrangements as control.

We have now included dosage analysis of 10 individuals from a haploid induction cross, but identified as diploid visually. Consistent with their phenotype, they do not display any copy number variation or rearrangement (Figure 1–figure supplement 1).

2) The experiment performed to address the meiotic origin is not convincing: The key experiment is to test progeny of GFP-tailswap.Ravi et al., 2010indicate that this line is not fully fertile suggesting a meiotic defect. In addition, aneuploids are detected in the progeny of selfed plants. With the current information, the authors cannot exclude a meiotic origin of the truncated and shattered chromosomes identified in aneuploids.

We agree that this is a possibility. Therefore, we sequenced 96 progeny individuals from a selfed GFP-tailswap. Of 96 sequences individuals, 94 were diploid and 2 were numerical aneuploids (Figure 2–figure supplement 4). No truncated and shattered aneuploids were observed, consistent with our results from the selfed triploid population and the hypothesis that chromosomal breaks originate from mitotic defects post-fertilization.

3) Genomic features at breakpoints: The analysis of genomic features is very weak and not convincing (Figure 4D-G). This analysis should be removed or validated by statistically tests. The abstract should be modified accordingly.

We have repeated the analysis and validated it by statistical testing. We provide the corresponding data in the revised manuscript (Figure 5F-G and Supplementary file 5).

4) The effect of Lig4 should be clarified: why such an effect when heterozygous in Ler ecotype?

We expanded the discussion of the lig4-2 mutation effect including a hypothesis on the potential mechanism: “This effect was still observed when the seed parent carried the wild-type allele (Table 1). It is possible that parental-specific haploinsufficiency results from early loss of the wild-type LIG4 allele located on the chromosome targeted for elimination, which in this case is the maternal chromosome.”

What is the genotype/genomic structure of the haploids?

Because the null lig4-2 mutant used in the cross is in the Col-0 background, we could not use SNP after visual identification of lig4-2 haploids to determine their genome's parental origin. Instead, the genotype of the lig4-2 haploids was assessed by PCR using primers (included now in Materials and methods) that distinguish the T-DNA tagged lig4-2 knock-out allele from the wild-type LIG4 allele. This confirmed the absence of the latter allele, consistent with genome elimination having occurred as expected. We next performed dosage analysis on these individuals and observed that all tested were euploid (Figure 7).

If the absence of NHEJ leads to loss of recovery of shattered chromosomes, why is the proportion of haploids (relative to diploids) increased? How does it fit with the model presented inFigure 5is unclear.

We have expanded the explanation of this observation: “We hypothesize that missegregated chromosomes enter a degradative pathway initiated by endonucleolytic breaks. Occasionally, such chromosomes are rescued (i.e. restituted to a haploid or diploid nucleus) through a pathway requiring NHEJ, resulting in aneuploidy. Therefore, more haploids are produced when the NHEJ pathway is impaired.”

For correspondence

Competing interests

Simon WL Chan (deceased)

Department of Plant Biology, University of California, Davis, Davis, United States

Gordon and Betty Moore Foundation, Howard Hughes Medical Institute, University of California, Davis, Davis, United States

Contribution

SWLC, Conception and design, Analysis and interpretation of data

Competing interests

The authors declare that no competing interests exist.

Funding

Gordon and Betty Moore Foundation (GBMF3068)

Isabelle M Henry

Maruthachalam Ravi

Keith R Bradnam

Mohan PA Marimuthu

Ian Korf

Luca Comai

Howard Hughes Medical Institute (HHMI) (GBMF3068)

Isabelle M Henry

Maruthachalam Ravi

Keith R Bradnam

Mohan PA Marimuthu

Ian Korf

Luca Comai

Czech Science Foundation (P501/12/G090)

Terezie Mandakova

Martin A Lysak

European Social Fund (CZ.1.07/2.3.00/30.0037)

Terezie Mandakova

Martin A Lysak

Department of Biotechnology, Ministry of Science and Technology (Ramalingaswami fellowship)

Maruthachalam Ravi

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Howard Hughes Medical Institute (HHMI) and Neil Hunter for their one-year support for EHT after the passing of SWLC We would also like to thank Anne B. Britt for the lig4-2 line, Meric C Lieberman for customized scripts and Kathie J Ngo for assistance with data management. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 Instrumentation Grants S10RR029668 and S10RR027303. This work also used the Caliper Sciclone NGS which was purchased with support from the NIH Shared Instrumentation Grant 1S10OD010786-01 awarded to LC. This work was funded by the HHMI and the Gordon and Betty Moore Foundation (GBMF) through grant GBMF3068 (to LC). Work by TM and MAL are funded by a research grant from the Czech Science Foundation (P501/12/G090) and by the European Social Fund (CZ.1.07/2.3.00/30.0037). MR is supported by DBT-Ramalingaswami fellowship. SWLC was a HHMI-GBMF Investigator who pioneered the field of centromere-mediated genome elimination until his untimely passing in 2012.

eLife is a non-profit organisation inspired by research funders and led by scientists. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science.eLife Sciences Publications, Ltd is a limited liability non-profit non-stock corporation incorporated in the State of Delaware, USA, with company number 5030732, and is registered in the UK with company number FC030576 and branch number BR015634 at the address:
eLife Sciences Publications, Ltd
Westbrook Centre, Milton Road
Cambridge CB4 1YG
UK