About

Saccharomyces Genome Resequencing

SGRP, the Saccharomyces Genome Resequencing Project, is a collaboration
between the Sanger Institute and Professor
Ed Louis' group at the Institute of Genetics, University of Nottingham. Our goal is
to advance understanding of genomic variation and evolution by analysing sequences from
multiple strains of the two Saccharomyces pecies, S cerevisiae and S paradoxus.

We have completed ABI sequencing of haploids of 37 cerevisiae strains and 27
paradoxus strains to a depth of between 1x and 3x, yielding a total of 1.42
million reads (1,292 megabases); and Illumina GA (Solexa) sequencing of
four of the 37 cerevisiae strains and an additional 10 paradoxus strains.

The sequence data has been aligned to the respective reference
genome sequences using SsahaSNP (for ABI) and Maq (for Illumina) followed by the application of
heuristics to select the most plausible alignments.
The SNPs (single-nucleotide polymorphisms) implied by these alignments have been extracted.
We have also developed methods, based on ancestral recombination graphs, for
imputing nucleotide values at positions in the genome where some
strains may have no or only poor-quality evidence while other,
closely-related ones are better represented.

Links

Browse the data. (This is no longer actively maintained and so may at times be temporarily unavailable.)

Download the reads, alignments and provisional assemblies of each
strain. This is what you need if you are interested in carrying out genome-wide analyses. You will also need:

Instructions

(substituting the strain name of your choice for W303) and click "Submit".

However, you need to be aware that because of some plate-handling
errors, the names of some of the reads there need to be corrected. These corrections have already been applied
in the SGRP browser and the FTP download data,
which you should use unless you specifically need NCBI format. Also, quality clipping has been applied to the
FTP download data, but not to the versions in the trace archive.

The full list of corrections is available on the ftp site.
In that file, a single name on a line by itself means that that read in the Trace Archive should be ignored, while
two names mean that the read with the first name should have the second name so that the p1k and q1k
reads are correctly paired. The strains in question, and the number of reads affected, are as follows.

S cerevisiae

S paradoxus

-

BC187

619

A4

-

DBVPG1373

85

CBS5829

651

DBVPG6044

1161

DBVPG4650

180

DBVPG6765

1128

DBVPG6304

1981

L_1374

96

N_17

78

SK1

19594

N_43

530

Y55

647

N_44

2273

YGPM

1343

N_45

720

YPS128

16347

Q59_1

871

YPS606

1151

T21_4

389

273614N

194

UFRJ50816

354

NCYC361

188

YPS138

201

UWOPS03_461_4

2822

UWOPS91_917_1

471

W303

-

-

-

YJM975

24

-

-

YJM978

1114

-

-

Data Release Policy

The release of pre-publication data from large resource-generating
scientific projects was the subject of a meeting held in January 2003,
the "Fort
Lauderdale meeting", sponsored by the Wellcome Trust, one of the
Project funders. The report from that meeting can be viewed here.

The recommendations of the Fort Lauderdale meeting address the roles
and responsibilities of data producers, data users, and funders of
"community resource projects", with the aim of establishing and
maintaining an appropriate balance between the interests of data users
in rapid access to data and the needs of data producers to receive
recognition for their work. The conclusion of the attendees at the
meeting was that responsible use of the data is necessary to ensure
that first-rate data producers will continue to participate in such
projects and produce and quickly release valuable large-scale data
sets. "Responsible use" was defined as allowing the data producers to
have the opportunity to publish the initial global analyses of the
data, as articulated at the outset of the project. Doing so also will
ensure that the data generated are fully described.