ABSTRACT

The recent advances in next-generation sequencing technologies provide a new and effective way of tracking malaria drug-resistant parasites. To take advantage of this technology, an end-to-end Illumina targeted amplicon deep sequencing (TADS) and bioinformatics pipeline for molecular surveillance of drug resistance in P. falciparum, called malaria resistance surveillance (MaRS), was developed. TADS relies on PCR enriching genomic regions, specifically target genes of interest, prior to deep sequencing. MaRS enables researchers to simultaneously collect data on allele frequencies of multiple full-length P. falciparum drug resistance genes (crt, mdr1, k13, dhfr, dhps, and the cytochrome b gene), as well as the mitochondrial genome. Information is captured at the individual patient level for both known and potential new single nucleotide polymorphisms associated with drug resistance. The MaRS pipeline was validated using 245 imported malaria cases that were reported to the Centers for Disease Control and Prevention (CDC). The chloroquine resistance crt CVIET genotype (mutations underlined) was observed in 42% of samples, the highly pyrimethamine-resistant dhpsIRN triple mutant in 92% of samples, and the sulfadoxine resistance dhps mutation SGEAA in 26% of samples. The mdr1 NFSND genotype was found in 40% of samples. With the exception of two cases imported from Cambodia, no artemisinin resistance k13 alleles were identified, and 99% of patients carried parasites susceptible to atovaquone-proguanil. Our goal is to implement MaRS at the CDC for routine surveillance of imported malaria cases in the United States and to aid in the adoption of this system at participating state public health laboratories, as well as by global partners.

INTRODUCTION

Molecular surveillance of imported malaria in travelers is an important public health activity that can aid in the detection and tracking of drug-resistant Plasmodium falciparum parasites (1). In 2012, the CDC Malaria Branch adopted Sanger sequencing for molecular surveillance of malaria drug resistance markers. These data are used to identify the emergence and spread of antimalarial drug resistance in different parts of the world (2).

While Sanger sequencing allows the identification of major resistance alleles (an allele with ≥50% frequency), it lacks the sensitivity to accurately identify minor alleles (an allele with less than 50% frequency) and mixed-infection P. falciparum genotypes. This is particularly important in the context of P. falciparum mixed infections, where an antimalarial drug can exert selection pressure on minor populations of drug-resistant parasites (3). Furthermore, conventional sequencing methods are not scalable for high throughput, require large amounts of template DNA, are labor-intensive, and sequence a single DNA strand at a time. In contrast, recent advances in next-generation sequencing (NGS) and bioinformatics provide a high-throughput, scalable, and cost-effective approach for identifying and tracking molecular markers of resistance.

NGS has enabled massively parallel sequencing, allowing more than 300 gigabytes of DNA to be read on a single flow cell. NGS requires significantly less template DNA, has improved accuracy compared to conventional sequencing methods, and allows multiplexing of hundreds of samples and markers in a single run, all of which provide a significantly less expensive and faster approach for molecular surveillance of drug resistance. The utility of using a targeted amplicon deep sequencing (TADS) approach for P. falciparum drug resistance surveillance has recently been demonstrated by a few studies (3–6).

Here, we describe a fully integrated end-to-end Illumina-based TADS and bioinformatics pipeline for molecular surveillance, malaria resistance surveillance (MaRS). The resulting protocol will allow researchers to amplify, sequence (up to 384 samples in a single run), and analyze all polymorphisms, known minor and major resistance alleles, and novel alleles across six full-length P. falciparum drug resistance genes (crt, mdr1, k13, dhps, dhfr, and the cytochrome b gene) and the full mitochondrial genome.

Read depth for single nucleotide polymorphisms (SNPs) associated with malaria drug resistance.The read depth for each of the drug resistance-associated polymorphisms in six genes (crt, mdr1, k13, dhfr, dhps, and the cytochrome b gene) was as follows: median, 253; mean, 375; 1st quartile, 84; 3rd quartile, 532; minimum, 5; and maximum, 4,208. The read depth was not uniform across the full-length genes and mitochondrial genome (see Fig. S3 in the supplemental material). The read depths for crt and mdr1 polymorphisms were lower than those for k13, dhps, dhfr, and the cytochrome b gene (Fig. 2).

Malaria drug resistance-associated genotype frequencies.Across all 245 samples, of the 49 drug resistance-related SNPs surveyed, 42% (21/49) were found as minor alleles and 57% (28/49) as major alleles (see Table S1 in the supplemental material). In the crt gene, associated with chloroquine resistance, C72S was found in 1 of 243 (0.4%) samples, N74I and N75E in 101 of 243 (41%), K76T in 105 of 243 (43%), A220S in 71 of 188 (38%), Q271E in 107 of 245 (44%), I326S in 3 of 223 (1%), I356T in 65 of 243 (27%), and R371I in 83 of 219 (38%) samples (Fig. 3; see Table S1 in the supplemental material). Minor alleles (<50% allele frequency) accounted for 10% (24/243) of all crt polymorphisms identified. C72S and N326S were found only as major alleles (≥50% allele frequency). The C350S polymorphism, previously reported to be associated with decreased chloroquine resistance in French Guiana (7), was not found in any of 243 samples with sufficient read depth at this position.

Associated nonsynonymous SNPs in malaria drug resistance genes. The graph depicts the frequencies of the wild type- and minor and major allele-associated and/or confirmed resistance SNPs. Allele frequencies are represented as percentages on the x axis. SNPs are listed as gene name:wild-type amino acid–codon position–mutant amino acid (left) and total number of samples (right) on the y axes. Major mutant alleles were classified as having ≥50% allele frequency and minor alleles as having ≤49% allele frequency. The k13 A578S polymorphism has recently been confirmed not to be associated with a resistance phenotype.

In the mdr1 gene, the N86Y polymorphism was found in 43 of 245 (17%), Y184F in 157 of 245 (64%), S1034C in 1 of 245 (0.4%), N1042D in 2 of 245 (0.8%), and D1246Y in 14 of 245 (6%) samples. N86Y and Y184F were found in 6 (2%) and 17 (7%), samples, respectively, as minor alleles (Fig. 3; see Table S1 in the supplemental material). The S1034C and N1042D polymorphisms were found as major alleles only. In the artemisinin resistance-associated gene k13, the C580Y polymorphism was found at an allele frequency of 100% in 2 of 245 (0.8%) samples (Fig. 3; see Table S1 in the supplemental material). Notably, these two samples were from malaria cases imported to the United States from Cambodia (see Table S1 in the supplemental material). The A675V polymorphism was found at an allele frequency of 57% in a single sample (1 of 241; 0.4%), which was imported to the United States from Ghana. The pyrimethamine resistance-associated N51I, C59R, and S108N polymorphisms in dhfr were found at 90 to 97% in 245 samples (Fig. 3; see Table S1 in the supplemental material). N51I, C59R, and S108N were found as minor alleles in 5 (2%), 6 (2%), and 2 (1%) samples, respectively. In comparison, in the sulfadoxine resistance-associated dhps gene, the S436A polymorphism was found in 107 of 245 (43%) samples, S436F in 2 of 245 (0.8%), A437G in 222 of 245 (90%), K540E in 63 of 243 (26%), A581G in 32 of 244 (13%), and A613S in 50 of 244 (20%) samples. Minor alleles accounted for 1 to 10% of these polymorphisms, with A613S having the highest prevalence of minor alleles (10%). The A613T polymorphism was found in 0 of 244 samples. In the atovaquone-proguanil (AP) (Malarone) resistance-associated cytochrome b gene, 1 of 240 (0.3%) samples was found to have the Y268S polymorphism (Fig. 3; see Table S1 in the supplemental material). I258M and Y268S were found in 0 0f 240 samples.

Malaria drug resistance-associated haplotype frequencies.The crt wild-type CVMNK and triple-mutant (mutations underlined) CVIET haplotypes were found in 58% (136/236) and 42% (100/236) of samples (Fig. 4). No double-mutant SVMNT or CVIEK haplotypes were found in this sample set. In comparison, the mdr1 wild-type NYSND and mutant NFSND haplotypes were found in 40% (76/191) and 47% (90/191) of samples, whereas the single mutant YFSND and double mutant YYSNY were found in 10% (19/191) and 3% (6/191) of samples. In dhfr, the triple-mutant IRN haplotype was identified in 92% (210/228) of samples. The wild-type NCS and mutant NCN, ICN, NRN, NRC, and IGN haplotypes were present in the range of 0.2% to 4% in 228 samples (Fig. 4). In dhps, the wild-type SAKAA haplotype was present in 4% (7/192) of samples. The single mutants AAKAA and SGKAA were found in 6% (11/192) and 34% (65/192) of samples, respectively. In comparison, the double mutants SGEAA, AGKAA, and AAKAS were identified in 26% (49/192), 12% (24/192), and 0.5% (1/192) of samples, respectively. The triple mutants AGKAS, SGKGS, and SGEGA were found in 5% (10/192), 0.5% (1/192), and 1% (2/192) of samples. Last, the quadruple-mutant AGKGS haplotype was identified in 11% (22/192) of samples (Fig. 4).

Haplotype distributions for crt, mdr1, dhps, and dhfr. The distributions of certain drug resistance-associated haplotypes identified in crt, mdr1, dhps, and dhfr are represented as percentages. Mixed infections were excluded from the analysis.

Other nonsynonymous SNPs in drug resistance-associated genes and the mitochondrial genome.In the mitochondrial genome, seven nonsynonymous polymorphisms were identified in the cytochrome b gene in 2 or more samples. They were I31V, V44I, T108S, A152T, I223L, V268I, and G451V. The G451E polymorphism was found in 95 samples (Fig. 4). In crt, D24Y and V141L were found in four and nine samples, respectively. The F145L polymorphism in crt was found in two samples. In mdr1, polymorphisms S232Y, N504K, F794C, and F938Y were identified. N504K and F938Y were found in 7 and 30 samples, respectively. In k13, seven polymorphisms were identified: F45Y, T149S, K189T, K189N, N217H, R255K, and L258M. The K189T polymorphism was found in 123, K189N in 6, N217H in 4, R255K in 14, and L258M in 5 samples. For dhfr, only one polymorphism, I164L, was found in three samples. In comparison, in dhps, five polymorphisms were found: E189Q, I431V, S436H, I484T, and K540N. The I431V polymorphism was found in 34 samples, while the remaining five were found in 2 or 3 samples (Fig. 4). In addition to the nonsynonymous polymorphisms, 10 synonymous polymorphisms were identified in the mitochondrial genome, 9 in mdr1, 3 in k13, and 2 in dhps (summarized in Fig. S1 in the supplemental material).

Validation of the pooling strategy using Sanger sequencing data.To ensure that pooling of samples, demultiplexing, and genotype calling were accurate at the individual patient level, Sanger sequencing and MaRS sequencing results were compared. Using the 245 samples, Sanger sequencing data were generated for the major drug resistance-associated alleles of crt, mdr1, k13, dhfr, dhps, and the cytochrome b gene (see Table S2 in the supplemental material). Sanger sequencing identified the A613T polymorphism in 2 samples not identified by MaRS (see Table S2 in the supplemental material). This was due to the fact that no coverage was obtained by MaRS at this locus for the two samples. The remaining polymorphisms across all the genes found by Sanger sequencing were in agreement with the MaRS results. In comparison, MaRS identified a total of 55 polymorphisms in 37 samples missed by Sanger sequencing. All of them were minor alleles (see Table S2 in the supplemental material).

MaRS run time on a standard laptop.The analysis of the 245 samples was completed in 51 min (20 s per sample for all the gene targets) using a laptop with 8 gigabytes of memory running a Linux operating system. At the completion of the run, the MaRS pipeline produces a number of standardized reports: separate Excel data frames and heat maps for associated drug resistance SNPs and all other SNPs (exonic and intronic), including coverage for each SNP locus. In addition, individual sample variant call format (VCF) files are generated.

DISCUSSION

Here, we describe an integrated end-to-end next-generation sequencing and bioinformatics protocol, called MaRS, for molecular surveillance of P. falciparum drug resistance markers. This approach is best suited to simultaneously studying allele frequencies of multiple full-length genes at the individual patient level by capturing both known and potential new single nucleotide polymorphisms associated with drug resistance. Generating the same quantity of data using conventional Sanger sequencing was more labor-intensive (Table 2).

We previously demonstrated the utility of using the described TADS method to detect low-frequency mutations in the context of identifying an early case of an atovaquone-proguanil treatment failure in a U.S. patient (3) and for molecular surveillance of k13 alleles in patients from Senegal (4). Similarly, with 28 clinical samples, Rao and colleagues used this strategy on the Ion Torrent sequencing platform (5). More recently, a homegrown TADS method using a multiplex PCR dual-indexing scheme has been described for tracking known P. falciparum SNPs associated with drug resistance (6). The method uses a number of custom primers to amplify and add sequencing adaptors directly to genetic fragments, allowing sequencing immediately post-PCR enrichment of selected parts of drug resistance genes (6). While the approach shows great promise for molecular surveillance of known drug resistance markers, especially at a cost of $10 per sample, it can be very laborious (requiring 50 primers and 14 PCR pools) and is error prone due to competitive priming, primer dimer generation, and the potential for cross-contamination between gene targets. In addition, the homegrown approach has an increased risk for index switching and sample bleeding of multiplexed samples, which can result in FASTQ files being misbinned during demultiplexing (8). In particular, since more than 50 primers and 14 PCR pools are used in the method, there is an increased chance of leftover adaptors and primers leading to an incorrect assignment of prepared libraries from an expected index to a different index pool. It is for these reasons that Illumina does not officially support homegrown index-adaptor combinations. Nonetheless, the method shows great promise, especially given the potential low cost per sample. Future studies are needed to validate the method and assess the demultiplexing and genotyping accuracy (6), especially using secondary confirmatory sequencing (such as capillary sequencing).

While cost is an important factor, the reproducibility, quality, and accuracy of these new NGS methods are even more critical. This is especially important when trying to link SNP-based genotyping data to phenotypic outcomes related to treatment of P. falciparum malaria. Cross-contamination is a serious problem with amplicon sequencing (i.e., pooled and/or at the individual patient level). Using large numbers of primers in a multiplex and PCR pools (6) can lead to over- and/or underamplification of particular gene targets during PCR (10). In addition, as discussed above, barcode hopping and misidentification of gene fragments are potential problems. Therefore, in addition to internal controls, both positive and negative (even if no DNA is detected by quantification) data must be analyzed using the same software algorithms and parameters. To detect contamination, negative controls need to be sequenced even if no visible band is present on a gel and analyzed using the same bioinformatics process. To check that desired results are obtained, multiple positive controls, with known reference polymorphism profiles and sequences, must be included in every sequencing run. The relative number of reads obtained by the negative controls compared with positive samples can serve as a guide to identify contamination, while coverage for each genomic locus can identify specific regions contributing to cross-contamination.

To further validate the MaRS laboratory and bioinformatics pipeline, Sanger sequencing was used to assess whether any barcode hopping and patient sample misidentification occurred; no evidence of this was observed. Commercial off-the-shelf analysis tools were used to carefully develop the open source MaRS bioinformatics analysis pipeline, and all results were confirmed independently by three bioinformaticians.

Nonetheless, every method and technology has its limitations, including the described MaRS protocol. First, the protocol was only tested using whole blood specimens and is not suitable for use in cases with very low parasitemia (2 to 5 parasites/μl). Second, nonuniform and biased coverage across the full-length genes and specific polymorphic loci (Fig. 5; see Fig. S2 in the supplemental material) can be an issue. Some loci, especially the crt R371S and N326S polymorphisms, are difficult to capture for all samples. Unfortunately, nonuniform coverage is a known limitation of Illumina sequencing, influenced by factors such as GC content, biased and preferential amplification of certain fragments, and read length. Last, while MaRS has been optimized and validated and all information regarding the detailed protocol (laboratory and analysis) is available, specialized equipment and training are required, especially for first-time adapters of the methods.

Additional exonic nonsynonymous SNPs in drug resistance genes. The graph depicts minor and major allele frequencies for non-drug resistance-associated exonic SNPs. Allele frequencies are represented as percentages on the x axis. SNPs are represented as gene name:wild-type amino acid–codon position–mutant amino acid (left) and total number of samples (right) on the y axes. Major mutant alleles were classified as having ≥50% allele frequency and minor alleles as having ≤49% allele frequency. SNPs were considered only if they were found in two or more isolates.

Sequence-specific polymorphisms in the P. falciparum chloroquine resistance transporter gene (crt) and the multidrug resistance protein 1 gene (mdr1) have been reported to play a role in chloroquine resistance. Specifically, amino acid substitutions at codon positions 72 to 76, 97, 220, 271, 326, 356, and 371 in P. falciparum crt (Pfcrt) are associated with in vivo and in vitro resistance to chloroquine (11–13). In this data set, we found about half of all samples (42%) carried the CVIETcrt haplotype, which is commonly found in Africa (14, 15). The K76T allele is strongly associated with chloroquine resistance and is usually found alongside polymorphisms in codon positions 72 to 75, 220, 271, 356 and 37 (11, 13, 16). This is also in agreement with our findings (see Table S2 in the supplemental material). The crt C350R allele has previously been found to be present alongside K76T in South America and to be associated with a partial reversal of chloroquine resistance (7). We did not encounter the C350R allele in this data set. However, South American samples were not likely to be well represented in the data set, as the majority of cases imported to the United States are from Africa (2).

The Pfmdr1 polymorphisms at codon positions 86, 184, 1034, 1042, and 1246 have been associated with increased resistance to chloroquine (17, 18). In addition to chloroquine resistance, these mdr1 polymorphisms can also modulate the malaria parasite's susceptibility to or tolerance for a number of other drugs, including quinine, mefloquine, amodiaquine, and halofantrine (19–22). In our data set, we identified polymorphisms in all five of these positions at various frequencies. The Y184F polymorphism, however, was the most prevalent substitution, found in 64% (157/245) of samples (see Table S1 in the supplemental material). In comparison, the N86Y polymorphism was found in only 17% (43/245) of samples. The N86Y allele has recently been shown to increase parasite susceptibility to lumefantrine, mefloquine, and dihydroartemisinin, whereas the Y184F allele had a lesser impact on parasite susceptibility to these drugs (23). The NFSND haplotype was the most prevalent (40%) haplotype identified in our data set and is commonly found across Africa and Southeast Asia (23). In comparison, the less commonly observed YFSND and YYSNY haplotypes reported in Africa and India were also found at low prevalences (10% and 3%, respectively) in our data set.

The P. falciparum mdr1 polymorphisms also play an important role in modulating resistance to partner drugs in artemisinin combination therapy (ACT). ACT uses a combination of the fast-acting artemisinin compounds with longer-acting antimalarial drugs, such as lumefantrine, mefloquine, and sulfadoxine-pyrimethamine. Confirmed resistance to ACT has now been reported with parasites carrying k13 polymorphisms Y493H, R539T, I543T, and C580Y (24). Recent work suggests that these k13 alleles are present at low frequencies in Africa (25–27). We found two samples carrying the C580Y polymorphism, and both were from malaria cases imported from Cambodia (see Tables S1 and S2 in the supplemental material).

The cytochrome b gene in the P. falciparum mitochondrial genome confers resistance to AP. Specifically, mutations I258M and Y268S/C in the cytochrome b gene are associated with AP resistance (28, 29). We found the Y268S substitution in a single sample from a malaria case originating in Liberia in 2017. The remaining 239/240 (99%) samples showed no evidence of either the I258M or Y268C polymorphism, suggesting these patients carried parasites susceptible to AP treatment. This is important, because AP is an antimalarial drug commonly prescribed in the United States for prevention of malaria in travelers. Currently, AP is also the most frequently used medicine for the treatment of uncomplicated malaria in the United States and Europe (30–33).

In addition to the AP resistance markers, several molecular markers have been reported to be associated with resistance to various other antimalarial drugs. Polymorphisms S108N, C59R, and N51I in dhfr are linked to pyrimethamine resistance (34–36). Specifically, the S108N polymorphism confers some pyrimethamine tolerance on the parasite, whereas the IRN triple mutant is highly pyrimethamine resistant. Of the total samples surveyed, 92% carried the highly pyrimethamine-resistant triple mutant. In comparison, we found varying frequencies of the S436A (43%), S436F (0.8%), A437G (90%), K540E (26%), A581G (13%), and A613S (20%) polymorphisms in dhps; all of these mutations have been reported to be associated with sulfadoxine resistance (37, 38). The eastern African SGEAA haplotype was found in twice as many samples as the western African AGKAA haplotype (26% versus 12%) (39). This is in agreement with the available travel histories in our data set (see Table S2 in the supplemental material).

One limitation of this analysis is that travel histories were available for only 105 of 245 (43%) patient samples. Therefore, the proportion of markers associated with resistance reported here may skew the potential conclusions discussed above. Nonetheless, combined, these results provide a detailed drug resistance profile for each individual patient sample (see Table S2 in the supplemental material). The majority of patient samples carried highly pyrimethamine-resistant parasites, and more than half (57%) of the patient samples had chloroquine-sensitive parasites. In comparison, the drug resistance profiles for all the patient samples indicate that, in the majority of cases, ACTs and AP remain efficacious for uncomplicated P. falciparum treatment.

As multiple genes can be associated with resistance to a single drug, especially with ACT, simultaneous analysis of these genes will be important in understanding the evolutionary dynamics of antimalarial drug resistance under different treatment scenarios. Monitoring of all known molecular markers of resistance from various geographical regions will help guide appropriate prophylaxis and treatment recommendations.

The targeted amplicon deep sequencing technology described here coupled with the standardized bioinformatics analysis tool MaRS provides an integrated system for characterization and tracking of antimalarial drug resistance. We are optimistic about the potential value of its application for the evaluation of drug effectiveness and to inform treatment and prophylaxis policies at both regional and global levels. This protocol has now been independently validated by the parasitology laboratory at the New York State Department of Health (NYSDOH) Wadsworth Center and is to be implemented for molecular surveillance of P. falciparum drug resistance upon assay approval by the Clinical Laboratory Evaluation Program. Our goal is to implement MaRS at the Centers for Disease Control and Prevention for routine surveillance of imported malaria cases in the United States and to continue to aid in the adoption of this system at participating state public health laboratories, as well as by global partners interested in utilizing the method for molecular surveillance of malaria drug resistance.

MATERIALS AND METHODS

Samples.Samples (n = 257) were obtained from the U.S. malaria cases reported to the CDC malaria surveillance network from 2014 to 2017. Only samples genotyped as P. falciparum were used in this study (n = 245). Travel histories, collected based on a questionnaire, were available for 105 of 245 (43%) samples. Genomic DNA isolated from P. falciparum strains 7G8 (crt SVMNT) and HB3 (crt CVMNK) was used as a positive control. DNA isolated from whole blood of individuals without malaria was used as negative a control.

DNA extraction.DNA was isolated using the whole-blood protocol for the QIAmp DNA blood kit (Qiagen, USA; catalog no. 51104) with some modifications to the elution steps. The elution AE buffer (10 mM Tris-Cl, 0.5 mM EDTA, pH 9.0) was equilibrated at room temperature (15 to 25°C) for 10 to 15 min prior to use. Once at room temperature, the pH was checked to ensure a pH range of 8.0 to 9.0. To achieve higher DNA concentrations, DNA was first eluted with 100 μl AE buffer, and then, the 100-μl eluate was used for a second elution. Each elution was added to the QIAamp Mini spin columns and incubated at room temperature for 5 min before centrifugation. The eluted DNA was used immediately or stored at −30 to −15°C for later use.

Sample quality check and species confirmation.A real-time PCR method with self-quenching fluorescent primers was used to identify P. falciparum and other species (40, 41). Samples with both Plasmodium species and P. falciparumCT values of <40 were used for downstream sample preparation steps. Samples with a CT value of >40 were considered malaria negative and were excluded. The detection limit of the real-time PCR assay is 3.2 parasites/μl (40).

PCR enrichment of drug resistance genes.PCRs were performed to amplify the full-length P. falciparumcrt, multidrug-resistant protein 1 (mdr1), kelch 13 (k13), bifunctional dihydrofolate reductase (dhfr), and dihydropteroate synthetase (dhps) genes and the mitochondrial genome. The New England BioLabs (NEB) High Fidelity PCR kit (New England BioLabs, USA; catalog no. 51104) was used to amplify all the genes according to the manufacturer's instructions with a 50.0-μl master mix preparation using the 5× GC buffer. Additional details, including primer sequences and thermocycling conditions, are provided in Text S1 in the supplemental material.

PCR amplicon purification, normalization, and pooling.A SequalPrep kit (ThermoFisher Scientific; catalog number A1051001) was used to normalize all the PCR products to 1.0 to 2.0 ng/μl and to remove leftover primers, salts, and enzymes. Following normalization and purification, 10.0 μl of each PCR product was pooled for each sample. For example, six purified target genes would result in a 60.0 μl pooled PCR product per sample.

Next-generation sequencing library generation, purification, and pooling.Illumina-supported sequencing adaptors and unique sequence indices were added to pooled PCR amplicons from the previous step, using the Illumina Nextera XT kit (Illumina; catalog numbers FC-131-1096 and FC-131-1002). This step generates a unique sequence barcode identifier (ID) for each sample and all pooled gene targets. Briefly, a transposome was used to “tagment” PCR amplicons, i.e., to fragment and then tag the DNA with adaptor sequences in a single step. Next, a limited-cycle PCR was performed to amplify the fragmented DNA and add unique (sequence) indices to each pooled sample. PCR products were then size selected (<300 bp), normalized, and purified using Agentcourt AMpure XP beads (Beckman Coulter Genomics; catalog number A63881). Following normalization, 5.0 μl of each library was pooled, allowing up to 384 libraries (or individual samples) to be pooled on a single Illumina MiSeq run.

Pooled library quality check, denaturation, and next-generation sequencing.Pooled library fragment sizes were checked using the Agilent D5000 ScreenTape station (Agilent Technologies; catalog numbers G2940C, 5067-4626, and G2940CA). The pooled library concentration was measured using a Qubit 3.0 fluorometer (Life Technologies Corp.; catalog numbers Q33216 and Q32853). The DNA concentration of the pooled library must be measured by a fluorometric quantification method that uses double-stranded DNA (dsDNA) binding dyes. Once the library fragment size and concentration were known, the concentrated pooled libraries were diluted to 4 nM using fresh RSB buffer (10 mM Tris, pH 8.5, Illumina MiSeq reagents v2 [catalog number MS-102-2003]). In preparation for cluster generation and sequencing, the 4 nM pooled libraries were denatured with fresh 0.2 N NaOH to produce a 20 pM library. The NaOH was deactivated using fresh Tris-HCl, pH 7.0, prior to hybridization and sequencing; this step is critical to ensure appropriate cluster generation when performing amplicon sequencing. The pHs of the RSB buffer, NaOH, and Tris-HCl were checked prior to every use, and they were prepared fresh. Any changes in pH would have resulted in failed and/or poor cluster generation. PhiX (Illumina) (5%) was added as an internal control for these low-diversity libraries. Specific details related to the custom-developed library denaturation and cluster generation steps are provided in Text S1 in the supplemental material.

MaRS protocol.The MaRS protocol proceeds in six stages: stage 1, sample quality control (QC); stage 2, target gene PCR enrichment; stage 3, amplicon purification and pooling; stage 4, amplicon indexing and library generation; stage 5, sequencing on the Illumina MiSeq; and stage 6, data analysis (Fig. 1). The detailed step-by-step protocol, including all materials, catalog numbers, equipment required, and how to run all analyses, is provided in Text S1 in the supplemental material. The electronic version of the full standard operating procedure (SOP) is also available online (https://github.com/CDCgov/MaRS/tree/master/lab_sop). Any future changes, additions, and/or modification to the SOP, such as including additional target genes, will be available at the same website.

MaRS bioinformatics pipeline development and analysis.Each step of the bioinformatics workflow, including optimization of various analysis tools and algorithm parameters, was performed using the Geneious R10 software package. Using these optimized and validated parameters, a standardized, self-contained open source analysis pipeline was created for data cleaning, reference mapping, and SNP identification across the full-length drug resistance genes k13, crt, mdr1, dhps, dhfr, and the cytochrome b gene and the mitochondrial genome. The analysis pipeline uses a list of user-defined confirmed and/or associated resistance alleles to detect known mutations for each of the drug resistance markers. This list (Reportable_SNPs.xlsx, found in the MaRS/ref directory) can be modified easily to include any additional resistance alleles as they become available.

The MaRS analysis pipeline contains all the necessary dependencies and can be run on a standard Unix operating system laptop. While the bioinformatics pipeline was developed and validated for P. falciparum drug resistance, it can also be adapted to other pathogens whose drug resistance loci are known.

The MaRS analysis pipeline and all dependencies can be installed locally using the following command: git clone https://github.com/CDCgov/MaRS. To run the analysis, the following command needs to be executed postinstallation: time python3.4 pyamd.py -i [input folder with.fastq files] -r ref/mdr.fa -a ref/adapters.fa -b ref/mdr.bed -o 170815 -m bwa –varofint ref/Reportable_SNPs_Report_v2.xlsx.

Using the standardized output reports from MaRS (Table S1), custom R scripts (also available on the MaRS github page) were used to generate the coverage and genotyping figures and tables in this article. Samples with genotype allele frequencies of <95% were considered mixed infections and were not taken into account during the construction of the crt, mdr1, dhfr, and dhps haplotypes. Additional details about the installation, running of the bioinformatics pipeline, and subsequent analysis and report generation can be found online (https://github.com/CDCgov/MaRS).

Ethics statement.Development of the MaRS methodology was approved by the Office of the Associate Director of Science, Center for Global Health, Centers for Disease Control and Prevention as research not involving human subjects (protocol 2017-192); application of the methodology to the U.S. surveillance samples was approved by the same office as a surveillance activity (protocol 2017-309).

Accession number(s).All data are made available through the NCBI under BioProject number PRJNA428490.

ACKNOWLEDGMENTS

This work was made possible through support from the Advanced Molecular Detection (AMD) initiative at the CDC. We also acknowledge partial support by the Atlanta Research and Education Foundation and the Association of Public Health Laboratories (APHL), which are funded by the CDC. J.K. and D.L. are currently supported in part by the Atlanta Research and Education Foundation. S.S. is currently supported in part by the Bioinformatics Fellowship Program administered by the APHL and funded by the CDC.

We thank Patrick Kachur and Barabara J. Marston for their constructive editing of and comments on the manuscript.

The findings and conclusions in this article are ours and do not necessarily represent the views of the Centers for Disease Control and Prevention.

. 2011. Increased prevalence of the Plasmodium falciparum Pfmdr1 86N genotype among field isolates from Franceville, Gabon after replacement of chloroquine by artemether-lumefantrine and artesunate-mefloquine. Infect Genet Evol11:512–517.doi:10.1016/j.meegid.2011.01.003.

. 2014. Novel mutation in cytochrome B of Plasmodium falciparum in one of two atovaquone-proguanil treatment failures in travelers returning from same site in Nigeria. Open Forum Infect Dis1:ofu059.doi:10.1093/ofid/ofu059.