Do such genetic markers exist? Markers that are present in one population (at least to a certain percentage) but not in another? Many human geneticists are eager to emphasize that genetic diversity within a population is far greater than between populations. This view is also reflected in a 2001 background paper prepared by the British Government for the last Review Conference of the BWC. It states that “there is as yet no indication of differences that could be used as the basis for ‘genetic weapons’ which would target particular ethnic groups”.

99.9% of the genetic sequence of any two human individuals is said to be identical – but the 0.1% that is different accounts for a total of 3 million “letters” of the human genome. There are thought to be several tens of thousands coding genes in the human genome, thus it is possible that every single coding gene between one individual and another could be slightly (or greatly) different, even if there is 99.9% homology in overall genetic sequence. Some of this huge genetic diversity breaks out in differences between populations. These genetic populations (using the term in its biological sense) appear to often correspond with (culturally-determined) ethnic groups (for a detailed discussion on human genetics and the pitfalls of racial genetic profiling in general see Sankar & Cho 2002, Aldhous 2002, Schwartz 2001, Wood 2001).

From a biological weapons perspective, population specificity would mean more than just a small variation in allele frequencies in different ethnic groups – no effective weapon can be designed that targets a genetic constitution that is also present to a significant extent in the population of the aggressor. From a military perspective, population specificity would mean that these genetic sequences are not or only to a very limited extent present in one (the aggressor’s) population while the same sequences are present in a significant percentage of an opposing population.

While it would certainly be desirable to have a very high percentage – up to 100% – of the target population bearing the target genetic marker, this is by no means a prerequisite for a militarily useful weapon. If as litte as 10% or 20% of a target population would be affected, this would wreak havoc among enemy soldiers on a battlefield or in an enemy society as a whole. Thus, in discussing genetic markers for ethnically-specific weapons, sequences would be needed that have a frequency close to 0% in one population while having a significant frequency in another. For the purpose of this paper, we assume that a frequency of 20% or higher may be enough from a military perspective.

Cytochrome P450 genes

The many genes in the cytochrome P450 system have been suggested as possible targets for ethnic specific weapons, for two reasons. They show high ethnic diversity, and they are involved in the detoxification of toxic substances. The notion is that ethnic groups with specific polymorphisms in a cytochrome P450 gene may be less able to detoxify a specifically designed biological or chemical weapon and thus be more susceptible to its action.

In our view, these genes are probably useless as a basis for ethnic weapons, as diversity in most of these cases relates to different percentages of certain alleles in different population, not situations in which one population has a certain allele while the other does not. Hence, a significant part of the aggressor’s population would be potentially vulnerable. In addition, the P450 system comprises many dozens of enzymes with overlapping activities. Targeting a chemical or biological compound to one specific P450 enzyme would be very challenging.

A systematic search in two databases revealed that genetic sequences that fulfill these specifications not only exist, but they do so in unexpectedly high numbers. Our analysis focussed on so called single nucleotide polymorphisms – SNPs – that are by far the most common source of genetic variation. SNPs are basically single-letter variations in the human DNA sequence. In the past years, several million SNPs have been identified by private and public entities. The SNP Consortium (TSC), representing a group of large pharmaceutical companies and not-for-profit organisations, keeps a public database on a many SNPs. Another SNP database, the SNP500Cancer database, is maintained by the Cancer Genome Anatomy Project of the US National Institutes of Health.

For at least some of the SNPs they describe, both databases provide data on allelic frequencies in different populations. We analysed a total of nearly 300 SNPs, all in coding regions or genes , from both databases. An unexpectedly high number of these SNPs are indeed population specific: 6.7% of the SNPs in one database (see table 1 below) and 1.6% of the SNPs in the other include one allele that is not present at all in one population while it has a frequency of more than 20% in another population.

Chrom. #

No. of SNPs with TSC-ID and frequency data on 2 or more populations

0 : ≥ 1% (n)

0 : ≥ 10% (n)

0 : ≥ 20% (n)(pop:pop)

TSC-ID 1

17

5

2

1 (A:C)

1166809 2

18

4

2

1 (A:AA) 1 (A:AA)

0493622 0231219 3

8

1

1

1 (A:AA)

0207612 4

12

1

1

5

9

1

1

6

9

3

3

1 (C:AA)

1104025 7

8

2

1

8

11

2

2

1 (C,A:AA)

0668661 9

7

2

1

1 (A:AA)

0815601 10

6

0

Total (n) (%)

105 (100%)

21 (20%)

14 (13.3%)

7 (6.7%)

Table 1: Ethnic-specific SNPs in the TSC database

From the database of The SNP Consortium (TSC) , SNPs were analysed for an ethnic specific allele distribution. The TSC database distinguishes between Caucasian, Asian and African-American samples . From 105 randomly selected SNPs in coding regions of the human genome, 21 had an allele frequency of 0% in one population but were present in at least one other population, 14 of these with a frequency ≥ 10% and 7 of these with a frequency ≥ 20%.

pop – population; A – Asian; C – Caucasian, AA – African American (e.g. A:C means that the minor allele is not present in the Asian population and has its highest frequency in Caucasians).

This finding is consistent with results from Stephens et al. (2001) who identified a total of 1,452 SNPs out of 3899 SNPs (37.2%) to be population specific, although the majority of these were rare SNPs. However, Stephens et al. (2001) also noted that "not all population-specific alleles were observed at a low frequency. In the African-American and Asian samples, some population-specific alleles were found at frequencies >25%."

In some cases, the frequency differences can be very high. For example, in our analysis of 105 SNPs from the TSC database, one SNP (TSC0493622) has a 0% : 94% ratio between major populations (see diagram 1 below). The G-allele of this SNP was present in 94% of the African-Americans and in 0% of the Asians sampled. The nature and function of the gene encoded by this genetic region is still unknown. Another example for a relatively high frequency difference is a polymorphism at the human melanocortin 1 receptor locus (MC1R), an enzyme involved in skin color formation. In a study by Rana et al. (1999) one allele was not identifiable in any Africans, but showed a frequency of 70% in East and Southeast Asians.

Diagram 1: Frequency of the minor allele of the 21 ethnic specific SNPs in the TSC database

The majority of the population specific SNPs had a rather low frequency for the minor allele of less than 20%, but some SNPs with higher frequencies were also identified. 14 SNPs had a minor allele frequency of 19% and less, while only 7 SNPs had a minor allele frequency of 20% and higher. For SNPs with minor alleles in 2 populations, the higher minor frequency value was chosen for this diagram. Some caution should be applied not to overestimate or interpolate our results. Both datasets as well as the work of Stephens et al. (2001) are based on a limited number of individuals for each population group . Hence, alleles with a very low frequency in any one population may have been missed. Therefore it is possible and likely that some of the alleles that were not identified in one population group may be present at low frequencies in these groups, so that many of the SNPs that were included in our analysis as they showed a 0% frequency for the minor allele would have to be excluded as their real frequency may be higher than 0% - which would not be surprising considering migration and intermarriage over millennia.

It is safe to assume, however, that a certain percentage of the SNPs included in our analysis will prove to be population specific even if larger numbers of individuals were screened. There are examples of unsuccessful searches for alleles in large populations: The gene for thiopurine methyl transferase (TPMT) is an enzyme involved in metabolism of certain pharmaceuticals. Allele *3A, which is the predominant mutant TPMT allele in individuals of European heritage, has not been identified in East Asian populations despite the analysis of a total of 1068 individuals in 5 independent studies (see van Aken et al. 2003 for review).

To summarize, in can be estimated that a considerable number of ethnic specific SNPs do exist. Recent numbers suggest that SNPs occur with a frequency of about every 200 base pairs in the coding sequences of human genes (Schneider et al. 2003). Given the total number of about 3 billion base pairs, some 15 million SNPs may exist in the human genome. If in a conservative estimate only 0.1% (as compared to the 6.7% and 1.6% determined in our analysis of the two datasets) of these do occur population specific frequencies (here defined as 0% in one population and > 20% in another), some 15,000 possible target sequences may exist for future bioweaponeers.

It should be noted that some of the ethnic specific SNPs we identified in our analysis have a known function and are indeed readily expressed in human tissue. For example, the SNP rs2894804 from the SNP500Cancer database is located in a gene called GSTA1, coding for glutathione S-transferase. This enzyme functions in the detoxification of xenobiotics, including carcinogens, therapeutic drugs and environmental toxins. It was present in the African-American population with a frequency of 23% while it was not identified in any of the other three populations.