Background

Peach (Prunus persica) is considered one of the genetically most well characterized species in the Rosaceae, and it has distinct advantages that make it suitable as a model genome species for Prunus as well as for other species in the Rosaceae. While some Prunus species, such as cultivated plums and sour cherries, are polyploid, peach is a diploid with n = 8 and has a comparatively small genome currently estimated to be ~220-230 Mbp based upon the peach v1.0 assembly. Peach has a relatively short juvenility period of 2-3 years compared to most other fruit tree species that require 6-10 years. In addition, a number of genes for fundamentally important traits have been genetically described in peach, including genes controlling flower and fruit development, tree growth habit, dormancy, cold hardiness, and disease and pest resistance.

Genome facts and statistics

Peach v1.0 was generated from DNA from the doubled haploid cultivar ‘Lovell’ which means that the genes and intervening DNA is “fixed” or identical for all alleles and both chromosomal copies of the genome. This doubled haploid nature was confirmed by the evaluation of >200 SSRs, and has facilitated a highly accurate and consistent assembly of the peach genome.

Peach v1.0 currently consists of 8 pseudomolecules (scaffolds) representing the 8 chromosomes of peach, and are numbered according to their corresponding linkage groups. The genome sequencing consisted of approximately 7.7 fold whole genome shotgun sequencing employing the accurate Sanger methodology, and was assembled using Arachne. The assembled peach scaffolds cover nearly 99% of the peach genome, with over 92% having confirmed orientation. To further validate the quality of the assembly, 74,757 Prunus ESTs were queried against the genome at 90% identity and 85% coverage, and we found that only ~2% were missing. This is truly a high quality genome! Gene prediction and annotation, is an ongoing process that may take years to complete, but current estimates indicate that peach has 28,689 transcripts and 27,852 genes.

Links to the peach genome browsers housed at JGI, the Genome Database for Rosaceae (GDR), and the Italian version housed at the Istituto di Genomica Applicata (IGA), along with links to the raw data are provided below. Also provided are resources and links to help you navigate and utilize the peach genome to further your research.

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar. Each data type page will provide a description of the available files and links do download. Alternatively, you can browse all available files on the FTP repository.

RosBREED Resequencing Alignments

A total of 23 different peach accessions were resequenced using Illumina short-read technology. The reads were trimmed and aligned to the Peach v1.0 genome and are available in BAM alignment files.

It is not necessary to download the BAM alignment files. Some are very large and multiple downloads may oversubscribe the network bandwidth. Rather, use the following instructions for viewing the alignments.

To view the strawberry resequencing alignments please follow these instructions:

First, download the Prunus_persica_v1.0.zip file. This file contains the reference sequence and gene models. After downloading, unzip this file in your working directory.

After IGV starts, load the genome file downloaded in the first step by clicking the menu item Genomes → Load Genome From File. Navigate to the folder where you unpacked the zip file from step 1 and select the file named Prunus_persica_v1.0.genome.

Select an alignment file you wish to view by right-clicking on a file with a .bam extension and select the option Copy link location (in Chrome and Firefox), Copy shortcut (in Internet Explorer) or Copy link (in Safari).

Add the alignment as a track in IGV by clicking the menu item File → Load from URL. Paste the URL copied in the previous step into the box.

You may load as many alignment files as you want

Repeats

The Prunux persica v1.0 genome repeat files are available in GFF3 formats. Repeats were predicted using the Repbase database, LTR Finder and ReAS prediction tools. A consenus file contains repeats from all three methods.

The IRSC (International Rosaceae Sequencing Consortium) has mined SNPs for peach which are included on Illumina Infinium arrays. These SNPs are provided in Excel format below. Additionally, all available candidate SNPs are also available, as well as cherry SNPs that have been mapped to the peach genome.

Using RNA-seq data from cherry cultivars SNPs were identifeid in 3'UTR regions of transcripts and then mapped to the peach genome. Those alignments (and all 3'UTR Cherry SNPs) are provided below in Excel and GFF formats

The Prunux persica v1.0 genome homology files are available for download in Excel formats with links to GBrowse and to external databases for matched homologs. All homology data was determined using the predicted peach gene transcripts (28,692 sequences) and NCBI blastx against various protein databases. An expectation value cutoff of 1e-6 was used. For EST alignments the NCBI Rosaceae and Genera EST databases were downloaded, and filtered for quality before blasting.

After the initial assembly of the peach v1.0 genome, some large scaffolds were missing markers that allowed for their proper orientation and placement within the pseudomolecules. Further analysis was performed to locate markers on 10 scaffolds greater than 300kp in order to place them and orient them within the assembly. A refined assembly was then generated. This refined assembly will be coming in a future release of the peach genome, and is reported in the upcoming peach genome publication. For reference, the following JBrowse viewer is available to visualize changes to the assembly.

The Prunux persica v1.0 genome assembly files are available in FASTA and GFF3 formats. There are a total of 202 scaffolds in this assembly of peach. The psuedomolecules corresponding to the eight chromosomes of peach are the first eight scaffolds of the assembly. In future releases these psuedomolecules will most likely be renamed but for now the pseudomolecules are named scaffold_1, scaffold_2, scaffold_3, etc.