MLST was developed for Streptococcus pyogenes (group
A streptococci, GAS) by Mark Enright in the laboratory of
Brian Spratt together with the laboratory of Debra Bessen
as a resource for researchers worldwide. The internet-accessible
database, funded by the Wellcome Trust and hosted at Imperial
College London, allows unambiguous comparison of data between
different laboratories.

The S. pyogenes MLST database currently contains information on over
1000 isolates, obtained from cases of serious invasive disease, upper respiratory
tract infection and impetigo, as well as macrolide-resistant isolates. Many
of the > 150 recognised emm-types are represented in this set.

Investigators carrying out MLST on this species are encouraged to submit
their data to the curator so that allelic profiles and strain details can be
added to the database. In this way the MLST database becomes an increasingly
useful resource for the S. pyogenes community.

Please acknowledge the use of this site in your publications
as follows: 'We acknowledge the use of the Streptococcus
pyogenes MLST database which is located at Imperial
College London and is funded by the Wellcome Trust'.

The S. pyogenes MLST scheme uses internal fragments
of seven housekeeping genes amplified by PCR using the following
primer pairs:-

Genes and Function

Sequences (5'-3')

Size of amplicon
used for
assigning alleles

glucose kinase

gki-up

GGCATTGGAATGGGATCACC

498

gki-dn

TCTCCTGCTGCTGACAC

glutamine transporter protein

gtr-up

GAGGTTGTGGTGATTATTGG

450

gtr-dn

GCAAAGCCCATTTCATGAGTC

glutamate racemase

murI-up

TGCTGACTCAAAATGTTAAAATGATTG

438

murI-dn

GATGATAATTCACCGTTAATGTCAAAATAG

DNA mismatch repair protein

mutS-up

GAAGAGTCATCTAGTTTAGAATACGAT

405

mutS-dn

AGAGAGTTGTCACTTGCGCGTTTGATTGCT

transketolase

recP-up

GCAAATTCTGGACACCCAGG

459

recP-dn

CTTTCACAAGGATATGTTGCC

xanthine phosphoribosyl transferase

xpt-up

TTACTTGAAGAACGCATCTTA

450

xpt-dn

ATGAGGTCACTTCAATGCCC

acetyl-CoA acetyltransferase

yiqL-up

TGCAACAGTATGGACTGACCAGAGAACAAGATGC

434

yiqL-dn

CAAGGTCTCGTGAAACCGCTAAAGCCTGAG

PCR conditions

The PCR reactions are performed in volumes of 50 mL, with an initial denaturation
at 95oC for five min, followed by 28 cycles of 95oC for 1 min, 55oC for 1 min
and 72oC for 1 min. The amplified DNA fragments are purified either by precipitation
with polyethylene glycol or using a commercial PCR purification kit. The sequence
of each fragment is obtained on both strands using the same primers as those
in the initial PCR amplifications.

As the same primers are used for amplification and sequencing, it is important
that only a single DNA fragment is amplified in the initial PCR. This may involve
some optimisation of the annealing temperature and other PCR conditions in individual
laboratories.

Obtaining
an allelic profile and comparing your strains with those
in our database

The allelic profile of a strain is based
on the sequence of internal fragments of
the seven housekeeping genes. The
sequences have to be trimmed so that they
correspond exactly to the region that we
use to define the alleles. The sequences
of the seven loci from a typical GAS can
be obtained below and can be used to ensure
that your sequences have been trimmed correctly. The
sequences must be obtained on both strands,
and they must be 100% accurate, since even
a single error may convert a known allele
into a novel allele.

Click the name below to obtain
a correctly trimmed sequence for that locus

The locus query options allow you to obtain an allele number
for each of you sequences. You can assign your alleles one locus at a
time by selecting the single locus option or, by using the multiple
locus option, you can cut and paste the correctly trimmed sequence for
all seven loci of a query strain into the corresponding boxes.

The software will check that the sequences are the correct length and that they
do not contain any unrecognised characters. A check is also made to see
if the submitted sequence is at least 70% similar to another allele at that locus
(in case you have cut and pasted a sequence into the wrong box, or selected the
wrong locus from the drop down menu). If the sequence corresponds to a
known allele, the allele number will be returned. If the sequence appears
to be a new allele it should be compared with the sequence of the most similar
allele for that locus to check that any nucleotide differences are real. If you
are convinced you have a new allele, you should submit the sequence traces to
the database curator (karen.mcgregor@tvu.ac.uk) who will check your data, and
provide you with a new allele number, and add your new allele to the database.

The profile query options allow you to search the database for
allelic profiles matching your own and to obtain information on strains with
that allelic profile. After you have obtained the allele numbers
at each locus for your query strain, you can select allelic profile query and
enter the seven integers. If the allelic profile is in the database, the
sequence type assigned to this allelic profile will be returned along with details
of any S. pyogenes isolates that are identical to the one you submitted. You
can also search for isolates that have allelic profiles that are similar to yours
(e.g. isolates that have at least 4/7, 5/7 or 6/7 matches to the submitted allelic
profile) and show relationships between your query strain and these strains by
using the tree button.

Further details about strains that are identical, or similar, to the query strain
can be obtained by clicking on the strain names.

There is also an option to perform a database query (e.g. to
look at the details of all strains of a particular emm-type) or for
more advanced querying.

If you have sequenced a large number of strains, options are available in the batch
query menu to allow data from multiple strains to be entered simultaneously.

For many of these pages, help boxes (?) are available with further
details on how to enter and retrieve data.

Strains have been identified which do
not produce housekeeping gene fragments
of described size (e.g. yqiL allele 48
has a 3-nt deletion). To assign
an allele number to a sequence of non-standard
length you should use the single
locus option from the locus
query menu. The sequence
will be identified as being a non-standard
length and a further search option lets
you query a database of such alleles. An
allele number will be returned if the sequence
corresponds to a known allele.

Strains have been identified that lack the yqiL locus. Due to the way the
software supporting the MLST database works, an allele number must be entered
for all seven loci to obtain an ST assignment for a query allelic profile. To
obtain an ST for a query strain lacking yqiL, the yqiL allele should be entered
as ‘67’ in the allelic profile query page.

Submitting
your data to the MLST database

Each database is maintained by a curator
and data can only be entered into a database
by a curator. The curator of the S.
pyogenes database is Karen
McGregor.

Submitting a new allele

Please send (preferably by email) two sequence trace files (one in each direction;
note: these do not have to be edited) for the new allele to the database curator,
along with the trimmed sequence (in a text file or within the body of the email)
of the proposed new allele.

Upon visual inspection of the trace files the curator will assign an allele number
and enter the sequence of the new allele into the database. If the curator
feels the trace files do not clearly show the identity of the unique nucleotide(s)
a number will not be assigned. The curator will contact you explaining
the reasons why this allele was not accepted and give you the opportunity to
submit another trace file for this allele.

Submitting a new allelic profile

To be assigned with a new ST designation you should submit the allelic profile
and information on a representative strain with epidemiological data to the database
curator, who will enter it in the MLST database and assign an ST number. If
the new allelic profile contains a new allele, sequence trace files need to be
sent to the curator as described above.

It should be noted that submission of a new ST which is a novel combination of
known alleles does not require the submission of sequence trace files. There
is, of course, the potential that one of these alleles has been sequenced incorrectly
and the onus is on the submitter to ensure that the allelic profile is correct. It
is strongly recommended that if a new ST is identified that varies at only a
single locus from a previously identified ST, sequencing of this variant locus
is repeated.

If you are submitting information on a number of strains at one time, a template
excel form is available which can be used for submissions. The template
can be obtained from the database curator.

Submitting strain information

Investigators are strongly encouraged to submit ST and strain information on
all their isolates, not just ones with new STs. The database will be of
most use to researchers if as much information as possible on as many isolates
as possible is included.

To submit information on isolates with previously reported STs a template excel
form can be used. This form can be obtained from the database curator.

Group C and G streptococci carry genes
that are highly homologous to the seven
housekeeping loci used in the S. pyogenes MLST
scheme and, in many cases, will amplify
fragments of these loci using the S.
pyogenes MSLT primers.

A protocol which produces comparable MLST information for group C and G streptococci
is currently being developed. In the future this database will contain
MLST information on all three Lancefield groups together with appropriate guidance
on the use and interpretation of this data. Information on these pages
will be updated accordingly.