Definition of a panel of Salmonella strains, closely related bacteria, and genome markers for rapid, specific and sensitive detection, typing, and control of this foodborne pathogen.

Rationale

There are three main reasons for the acute need to sequence Salmonella:

Current Salmonella sequences in public databases do not reflect the diversity of the species

Whole genome sequences of Salmonella from animal sources are over-represented in public sequence databases

There is a need for high-quality sequence data to conduct the analysis needed during this project

A funnel-type high-throughput strategy will be used to initially screen the 4,500 Salmonella strains using in vitro mammalian and whole cell invertebrate models. Based on the criteria of high, medium, low and no levels of virulence, the restricted number of Salmonella isolates representing the repertoire of the most common serovars will be further characterized using an animal and a gastrointestinal assay for virulence expression.

Sub-objectives

The following sub-objectives will be instrumental to reach the main goal:

SalFoS is describing all strains, phenotypes, genotypes and metadata. When available, SalFoS will also includes the following additional information: isolate identification, host, researcher, date of isolation, geographical origin, phenotypic data, DNA extraction details, NGS information and genome assembly. SalFoS will be used as a repertoire of available Salmonella serovars and will be publicly available as a web-based display of Salmonella genome sequences.

We will sequence, assemble and annotate the genomes of 2,000 primarily Canadian isolates of Salmonella foodborne S. enterica, including an average of 10 strains from each of the 100 serovars (1,000 genomes) that represent 98% of all isolates reported to the CDC. Another 1,000 genomes of environmental (water, soil, other sources) isolates from the remaining 2,400 serovars, not frequently associated with human illness, will also be sequenced.

We will analyze the 2,500 genomes provided from the 100K Genome Project (American isolates).

We will also sequence and analyze the genomes of 100 isolates from each of the Proteus spp., Citrobacter spp., and Hafnia spp., commonly misidentified as Salmonella during rapid microbiological testing of food samples. We will also sequence and analyze 100 Salmonella specific lytic bacteriophages.

Obj. 1.3 Selection of Salmonella genomes representing the most common 100 serovars and creation of a reference panel of high-quality genomes

Analysis of the phylogeny of our dataset of 2,000 Salmonella genomes, the 2,500 genomes from the 100K Genome Project and 300 isolates of closely associated bacteria known to cause false positives in Salmonella rapid diagnostics. Analysis will be conducted to identify SNPs in the core genome using PANSEQ and HARVEST. A final assembly will be done by combining the MiSeq data and the longer reads from PACBIO.

Obj. 1.4 High-throughput mammalian and protozoan whole cell screening

A high-throughput (HT) gentamicin protection screening assay will be used to quantify Salmonella interaction with human intestinal cells. The HT assay will allow initial classification of the isolates into low, medium and high pathogenicity, based on their ability to attach and invade human cell lines. Human intestinal INT-407 (Henle) cells (ATCC CCL-6) will be used.

Two protozoan models will be used for high-throughput screening of Salmonella strains’ virulence: amoeba and ciliate.

Obj. 1.5 Refined analysis of virulence in an animal model

To complement the in vitro and in cellulo models, the virulence of the prioritized Salmonella strains will be tested in mouse disease models.

Using a novel human GIT model, we will determine the effect of gastrointestinal conditions on survival, pathogenicity and virulence gene expression in Salmonella strains varying in their capacity to cause disease and their persistence in the environment.

Milestones

1) Database containing all metadata per reception of bacterial isolates.