3.3.3. Single nucleotide polymorphisms (SNPs)

Single nucleotide polymorphism (SNP) markers are the
most recent addition to the molecular toolkit for honey bee genetic analysis
(see also the section on SNPs of the BEEBOOK
paper on molecular methods (Evans et al.,
2013). A SNP is a change of a single base, usually by just one alternative
nucleotide, in a given position of a DNA sequence. For example, in chromosome 5
of A. m. mellifera there is a DNA sequence in the gene that codes for
subunit 1 of replication factor C that displays two alternative forms, either
…AACTTATCAAA… or …AACTTGTCAAA… (Pinto, unpublished
data). In this case there are two alleles, A and G, created by a transition
mutation. While there would be four possible nucleotides at each position of a
sequence stretch, due to the low mutation rate, which is about 10-8 to 10-9
changes per nucleotide per generation (Brumfield et al., 2003), SNPs are
usually bi-allelic.

While SNPs have only been used in an evolutionary
study (Whitfield et al., 2006; Zayed
and Whitfield, 2008) and a QTL study (Spötter et al., 2012), they have
great potential for application in subspecies identification, for several
reasons. At the analytical level, the genome-wide coverage (coding and
non-coding regions), ubiquity, codominance, and conformation to infinite sites
model of evolution (Vignal et al., 2002) facilitate employment of more
powerful and robust approaches, potentially leading to more reliable
identification and more accurate estimates of introgression levels. At the technical
level, the possibility of using new technologies enabling high throughput
genotyping, data quality, and easy calibration among laboratories facilitate
screening of large sample sizes (loci and individuals), data exchange among
laboratories, and development of public databases.

Employment of SNPs for subspecies identification using
high throughput technologies requires a SNP assay, which can be purchased, if
commercially available. Otherwise, it must be developed (as in Whitfield et
al., 2006 and Spötter et al., 2012), an expensive and time consuming
endeavour requiring high tech equipment and expertise often only available in a
core laboratory facility (see development details in Spötter et al.,
2012). Unlike for humans and other model organisms, there is only one
commercial SNP assay for honey bees, available via AROS Applied Biotechnology
AS (Denmark).
This assay was designed by Spötter et al. (2012) for detection of Varroa
tolerance in A. m. carnica and allows screening of 44,000 loci. Hence,
its application for honey bee subspecies identification may not be appropriate.

Genotyping is also costly, but it will likely become
increasingly affordable. As an example, AROS Applied Biotechnology AS company
charges 261€ (2012 price) per individual honey bee (minimum number of analysed
individuals is 95) for screening the 44,000 loci, which is inexpensive if we
consider the per locus price. While contracting the services of a private
company is expensive, purchasing the equipment and software for SNP genotyping
is not affordable any longer for an academic laboratory performing medium-scale
studies, unlike the standard equipment needed for mtDNA and microsatellite
analysis (Table 6).

Other obstacles of working with thousands of SNP loci
are related with the size of the datasets as they require more powerful
computers, especially for analyses that are computationally intensive such as
Bayesian Markov Chain Monte Carlo methods (used by the popular software
Structure, for example). In addition, the software packages must be able to
handle large input files. However, for many of the standard analyses, this is
not a problem anymore as packages have been modified to deal with large
datasets.

The sampling scheme (e.g. number of individuals per
colony, number of colonies per apiary and population) in SNP surveys will
depend on the research question (e.g., a population genetics-related question
requires only one individual per colony), as for the other markers described in
this chapter. While the number of individuals used for the other markers could
be adopted for SNPs, genome saturation with thousands of SNP loci may lead to
violation of the assumption of independent (unlinked) loci, assumed by many
analytical approaches. Therefore, either linked loci are removed or other
analytical methods (haplotype-based, for example) are employed. Most software
packages used for microsatellites, such as Structure (Pritchard et al.,
2000), Arlequin (Excoffier et al., 2005), NewHybrids (Anderson and
Thompson, 2002), GenAlEx (Peakall and Smouse, 2006), Genepop (Raymond and
Rousset, 1995), GeneClass (Cornuet et al., 1999), FSTAT (Goudet, 1995),
can also be applied to SNPs. However, for clustering analysis the recently
developed software Admixture (Alexander et al., 2009) is much faster
than Structure (Pritchard et al., 2000) and therefore more suited for
large datasets.

In spite of the promising power of SNPs for subspecies identification,
the cost of developing a SNP assay and genotyping will probably preclude
widespread adoption of this cutting edge tool in the near future.