An improved method for rapid identification of microorganisms is disclosed, along with sequences of PCR primers optimized for this purpose. The primers are designed based on information analysis of sequences from a large number of organism to amplify certain segments of genomic DNA whose sequences are unique among different organisms. The PCR products are compared with a DNA sequence database to obtain the identity of the microorganisms. This approach provides an accurate and fast identification and taxonomic assignment of microbial species.

Claim:

The invention claimed is:

1. A method for obtaining data for taxonomic assignment of unknown species comprising: (a) selecting from more than one known species a divergent segment of DNA withlow average information content surrounded by two conserved segments of DNA wherein said conserved segments comprise DNA segments with high average information content and wherein information content is determined by average information in bits of arelated set of sequences and represents the total sequence conservation calculated by: R.sub.sequence=2-[-.SIGMA.f(b,l) log 2f(b,l)+e(n(l))] b=[A,G,C,T] wherein f(b,l) is the frequency of each base b (A,G,C, and T) at position l, and e(nl) is ageneralized correction term determined by Shannon's Uncertainty for a sample size n at position l, (b) selecting primers for PCR amplification of said divergent segment such that said primers anneal to said conserved segments in any of these one or moreunknown species, wherein each primer contains a mixture of nucleotides in which each nucleotide is present at the same proportion as it is present in the set of sequences of the known species used to compute the information content; (c) amplifying saiddivergent segment of DNA by PCR technique using said primers to obtain PCR products; (d) separating said PCR products; (e) comparing said separated PCR products with a database consisting of properties that can be converted or derived from a subjectsequence and from similar sequences distinguishable from the subject sequence, wherein said properties are selected from the group consisting of nucleotide composition, base composition, nucleotide sequence, DNA structure, mass ratio of the DNA moleculesor fragments derived therefrom, chemical reactivity of the DNA molecules, binding properties to other DNA molecules or proteins, thermal stability, and combination thereof; (f) measuring a taxonomic distance of said properties between said separated PCRproducts and organisms in said database; and (g) assigning a taxonomic reference to said separated PCR products based upon said properties, wherein the PCR products are separated by means selected from the group consisting of magnetic separation, gelpurification, mass spectrometry and by limiting dilution of mixtures of amplicons into multiple aliquots with each aliquot containing a distinct individual DNA molecule, and combination thereof.

2. A method for identifying an organism in a sample, comprising: (a) selecting from more than one known species a divergent segment of DNA with low average information content surrounded by two conserved segments of DNA wherein said conservedsegments comprise DNA segments with high average information content and wherein information content is determined by average information in bits of a related set of sequences and represents the total sequence conservation calculated by:R.sub.sequence=2-[-.SIGMA.f(b,l) log 2f(b,l)+e(n(l))] b=[A,G,C,T] wherein f(b,l) is the frequency of each base b (A,G,C, and T) at position l, and e(n(1) is a generalized correction term determined by Shannon's Uncertainty for a sample size n at positionl, (b) selecting primers for PCR amplification of said divergent segment such that said primers anneal to said conserved segments in any of these one or more unknown species, wherein each primer contains a mixture of nucleotides in which each nucleotideis present at the same proportion as it is present in the set of sequences of the known species used to compute the information content; (c) amplifying said divergent segment of DNA by PCR technique using said primers to obtain PCR products; (d)separating said PCR products; (e) comparing said separated PCR products with a database consisting of properties that can be converted or derived from a subject sequence and from similar sequences distinguishable from the subject sequence, wherein saidproperties are selected from the group consisting of nucleotide composition, base composition, nucleotide sequence, DNA structure, mass ratio of the DNA molecules or fragments derived therefrom, chemical reactivity of the DNA molecules, bindingproperties to other DNA molecules or proteins, thermal stability, and combination thereof; and (f) identifying one or more organisms in said separated PCR products based upon said properties, wherein said separation step comprises limiting dilution ofmixtures of amplicons into one or more aliquots, each aliquot containing either a distinct single DNA molecule or a homogenous group of DNA molecules having the same sequence.

3. The method of claim 2, wherein the PCR products are separated by means selected from the group consisting of magnetic separation, gel purification, mass spectrometry, and combination thereof.

4. A method for identifying an organism in a sample, comprising: (a) selecting from more than one known species a divergent segment of DNA with low average information content surrounded by two conserved segments of DNA wherein said conservedsegments comprise DNA segments with high average information content and wherein information content is determined by average information in bits of a related set of sequences and represents the total sequence conservation calculated by:R.sub.sequence=2-[-.SIGMA.f(b,l) log 2f(b,l)+e(n(l))] b=[A,G,C,T] wherein f(b,l) is the frequency of each base b (A,G,C, and T) at position l, and e(n(l) is a generalized correction term determined by Shannon's Uncertainty for a sample size n at positionl, (b) selecting primers for PCR amplification of said divergent segment such that said primers anneal to said conserved segments in any of these one or more unknown species, wherein each primer contains a mixture of nucleotides in which each nucleotideis present at the same proportion as it is present in the set of sequences of the known species used to compute the information content; (c) amplifying said divergent segment of DNA by PCR technique using said primers to obtain PCR products; (d)separating said PCR products; (e) comparing said separated PCR products with a database consisting of properties that can be converted or derived from a subject sequence and from similar sequences distinguishable from the subject sequence, wherein saidproperties are selected from the group consisting of nucleotide composition, base composition, nucleotide sequence, DNA structure, mass ratio of the DNA molecules or fragments derived therefrom, chemical reactivity of the DNA molecules, bindingproperties to other DNA molecules or proteins, thermal stability, and combination thereof; and (f) identifying one or more organisms in said separated PCR products based upon properties, wherein said divergent segment of DNA surrounded by said conservedsegments is performed in the region of the DNA coding for the ribosomal RNA of any organism, said ribosomal RNA is the 28S ribosomal RNA from eukaryotic organisms or the 16S ribosomal RNA from prokaryotic organisms, and the separated PCR products arecharacterized identified based upon nucleotide sequence.

wherein the subscripts represent the relative abundance of the corresponding nucleotide at that position in the sequence as determined by information analysis of sequences from prokaryotic species.

8. The method of claim 4, wherein the PCR products are separated by means selected from the group consisting of magnetic separation, gel purification, mass spectrometry, by limiting dilution of mixtures of amplicons, each aliquot containing adistinct individual DNA molecule.

9. The method of claim 4, wherein the PCR products are separated by means selected from the group consisting of magnetic separation, gel purification, mass spectrometry, by limiting dilution mixtures of amplicons into one or more aliquots, eachaliquot containing a homogenous group of DNA molecules having the same sequence.

10. The method of claim 1, wherein the PCR products are separated by limiting dilution of mixtures of amplicons into one or more aliquots, each aliquot containing a distinct single DNA molecule and said aliquots are amplified by emulsion PCR.

11. The method of claim 1, wherein the PCR products are separated by limiting dilution of mixtures of amplicons into one or more aliquots, each aliquot containing a homogenous group of DNA molecules having the same sequence and said aliquotsare amplified by emulsion PCR.

12. The method of claim 4, wherein the PCR products are separated by limiting dilution of mixtures of amplicons into one or more aliquots, each aliquot containing a homogenous group of DNA molecules having the same sequence.

13. The method of claim 4, wherein the PCR products are separated by limiting dilution of mixtures of amplicons into one or more aliquots, each aliquot containing a homogenous group of DNA molecules having the same sequence.

14. The method of claim 12, wherein said aliquots are amplified by emulsion PCR.

15. The method of claim 13, wherein said aliquots are amplified by emulsion PCR.

Description:

RELATED APPLICATIONS

This application claims priority of U.S. Provisional Application No. 60/886,595, filed Jan. 25, 2007, and U.S. patent application Ser. No. 12/011,425 the content of which is hereby incorporated into this application by reference. This is aCONTINUING PATENT APPLICATION U.S. patent application Ser. No. 12/011,425.

BACKGROUND

1. Field of the Invention

The present disclosure pertains to methods for rapid detection and identification of microorganisms. More particularly, the disclosure relates to identification of prokaryotic organisms through molecular characterization of their geneticmaterials, such as DNA or RNA.

2. Description of Related Art

Microorganisms, such as bacteria, are a major cause of infections in higher mammals, including human. Although some infections may be treated without knowing the identity of the infectious agent, it is sometimes important for clinicians to knowthe identity of the infectious agent in order to prescribe the most effective treatment for the infection. This is particularly true for bacterial infections because different species of bacteria may respond differently to the same antibiotics.

In addition to their role as pathogens, prokaryotic microorganisms also play an important role in many industrial areas. Beer spoilage caused by bacteria has been a chronic problem for the beer industry. Prokaryotes are often found in variousenvironmental contamination sites, and knowledge of the identity of these organisms can be useful for remediation. Identification of these prokaryotes may be instrumental for solving these problems.

Conventional methods for classification and identification of a microorganism require culturing of the microorganism, and typically rely on morphological or biochemical characteristics of the organism. The culturing step may delayidentification, which can have consequences on the effectiveness of appropriate treatment, and may also increase exposure of laboratory workers to pathogens. The delay in identification may in turn increase the chances that the pathogens may be spreadto others while awaiting lab test results. Moreover, the test results may be skewed when multiple microorganisms are present and the growth of one microorganisms inhibits the growth of others in the laboratory environment. There is therefore a need fora method which may rapidly identify the organisms without requiring culturing of the microorganisms in a laboratory environment.

Nucleic acid sequences of homologous genes have been used to distinguish different species. The differences in nucleotide usage, frequency and arrangement may indicate the degree to which different organisms have diverged from a commonancestor. U.S. Pat. No. 5,849,492 discloses a method for rapid identification of species based on taxonomically variable set of orthologous sequences, including ribosomal RNA genes. More specifically, the '492 patent teaches a process whereby thesequences of ribosomal RNA molecules may be used to identify genetic differences between species. An information theory-based sequence analysis is used to select sequences in the homologous 16S ribosomal RNA genes (16S rDNA) for DNA amplification. The'492 patent discloses a pair of primers amplifying orthologous ribosomal gene or RNA sequences that are selected using information theory-based methods that detect gene regions revealing sequences that are maximally divergent among multiple species,which make these primers and amplicons useful for identifying prokaryotes. However, there are limitations to the sensitivity and specificity of this method, because computational analysis in the '492 patent was based on a multiple alignment of 16S rDNAsequences from only 55 prokaryotic organisms.

SUMMARY

It is hereby disclosed a methodology by which nucleic acid amplification is used to identify microorganisms without the need to culture the infectious agents. A single DNA amplification and sequencing assay (omnibus PCR) have been developedwhich may accurately identify a wide spectrum of infectious disease agents in vitro within a few hours after the specimen is collected.

U.S. Pat. No. 5,849,492 describes methods and primer sequences for 16S rDNA and 28S rDNA for identification of prokaryotic and eukaryotic organisms, respectively. The teachings of the '492 patent are hereby expressly incorporated into thisdisclosure by reference.

The methodology disclosed here is an improvement upon the technology described in the '492 patent. The present disclosure uses a more comprehensive set of orthologous gene sequences derived from a more diverse and larger set of taxa than thosedescribed in the '492 patent to design primers that are capable of amplifying the 16S rDNA from a broader spectrum of prokaryotic species. As a result, a wider spectrum of organisms may be identified with the presently disclosed primers and methodology.

Since almost all organisms employ ribosomes to synthesize proteins, ribosomal subunits have been structurally and functionally conserved throughout the eons. Thus, ribosomal RNAs from widely differing species may differ in a small number ofnucleotides. These limited sequence variations may be used to characterize the evolutionary or phylogenetic relationships between the organisms and to identify a specific organism. Briefly, information (in bits) may be used to precisely quantify boththe similarities and divergence among 16S gene sequences, because information measures the number of choices between two equally likely possibilities (Schneider et al., J. Mol. Biol. 188: 415-431, 1986). Variable positions in a multiply aligned set of16S rDNA sequences approach zero bits and homologous or highly conserved sequences have nearly two bits in a sequence logo (Stephens & Schneider, Nucl. Acids Res. 18: 6097-6100, 1990), which displays the average information content (R.sub.sequence) andfrequencies of each nucleotide at each position.

The average information in bits of a related set of sequences, R.sub.sequence, represents the total sequence conservation:

.times..function..times..times..times..times..function..function..functio- n. ##EQU00001## f(b,l) is the frequency of each base b at position l, e(n(l) is a correction for the small sample size n at position l.

A sequence logo may then be constructed based on the R.sub.sequence to locate segments consisting of sequences with low information content flanked on either side by sequences with high information content.

Three different sets of PCR primers based on the 16S rDNA sequences from more than 2000 species were developed using the instant method and tested using both purified DNA from 100 different bacterial pathogens that are commonly found in hospitallaboratories and with 299 uncultured clinical specimens of various types from patients with suspected bacterial infections. Primer set A (coordinates 931-1462 of the 16S rDNA sequence logo) may be used to amplify segments of the 16S rDNA product fromprokaryotes. The other two sets of primers, B and C, which amplify sequences corresponding to coordinates 1819-2370 and 1819-2599 of the sequence logo, respectively, may be employed to confirm or refine the amplification results obtained using primerset A. 90% of the prokaryotic organisms identified with primer set A can be confirmed and in some instances, refined with primer sets B and C.

An improved method is also described for identifying more than one microorganisms present at the same infection site. Using the PCR and sequencing methodology described above, there may be instances where the sequence is not readable becausethere are multiple peaks at several locations in the sequence. To eliminate this problem, a constant denaturing gel electrophoresis (CDGE) protocol has been developed, which allows DNA to be separated on the basis of sequence composition and duplexstability in a vertical polyacrylamide gel.

The separated PCR products may be characterized based upon properties selected from the group consisting of nucleotide composition, base composition, nucleotide sequence, DNA structure, mass ratio of the DNA molecules or fragments derivedtherefrom, chemical reactivity of the DNA molecules, binding properties to other DNA molecules or proteins, thermal stability, and combination thereof. In another aspect, multiple PCR products may be separated and sequenced simultaneously using massspectrometry.

In summary, it is disclosed here a number of oligonucleotides useful for taxonomic assignment of unknown species as well as for identification of clinically important pathogens. The method may generally include the following steps: (a)searching for a divergent segment of DNA with low average information content determined quantitatively surrounded by two conserved segments of said DNA with high average information content determined quantitatively; (b) designing primers for PCRamplification of said divergent segment by constructing a sequence logo for said DNA such that said primers contain a set of sequences present in said sequence logo that encompass the nucleotide variability of said conserved segments, which primers cananneal to said conserved segments; (c) amplifying said divergent segment of DNA by PCR technique using said primers to obtain PCR products; (d) separating said PCR products based on the difference in sequences; and (e) characterizing the separated PCRproducts based upon properties selected from the group consisting of nucleotide composition, base composition, nucleotide sequence, DNA structure, mass ratio of the DNA molecules or fragments derived therefrom, chemical reactivity of the DNA molecules,binding properties to other DNA molecules or proteins, thermal stability, and combination thereof.

In the case of pathogen identification, clinical samples are preferably processed to obtain a solution or suspension containing the DNA or RNA from the pathogens. Clinical samples may include, for example, blood, bone marrow aspirate, synovialfluid, biopsied samples, mucus, stool, urine, etc. In another aspect, the samples may be processed to remove certain impuritie that may impede the PCR reactions, such as red blood cells, salts, etc. At times, sample concentration or dilution may beneeded to optimize the PCR condition. In yet another aspect, the method may further include a step wherein the existence of a pathological condition or a disease in an individual is determined base upon the identity of the organism obtained in steps(a)-(e) described above.

For purpose of this disclosure, pathogens may include bacteria, viruses, fungi and other clinically significant microorganisms generally known to the medical community For certain microorganisms that have RNA exclusively as their geneticmaterials, reverse transcription may be performed before subjecting the samples to PCR as described in Step (c).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart showing the steps of the OmniAmp procedure.

FIG. 2 shows a sequence logo constructed based on the rDNA sequences from 2184 organisms obtained from Genbank v88; National Library of Medicine.

FIG. 5 shows results of Omnibus amplification of a representative sample of 23 different organisms.

FIG. 6 shows results of separation of PCR amplification products derived from the mixtures of infectious agents by CDGE.

DETAILED DESCRIPTION

Both the similarities and differences between microorganisms may be used to obtain the identity of these organisms. Morphological and biochemical properties have been used to differentiate among organisms, however, these methods can be timeconsuming and can be inaccurate, and if culturing conditions are not correct, may fail to identify the organism that is present in a specimen. According to the present disclosure, nucleic acid sequences of homologous genes in different species mayreveal the identity of infectious agents. The frequency and arrangement of nucleotide differences indicate the degree to which two organisms have diverged from a common ancestor. In a preferred embodiment, the sequences of ribosomal DNA may be used toidentify genetic differences between bacterial species.

In order to ensure that the widest spectrum of organisms may be identified, it is desirable to apply the information theory-based sequence analysis to a greatest possible number of species to select for sequences in the homologous 16S ribosomalRNA genes (16S rDNA) for DNA amplification. In one embodiment, full length 16S rDNA sequences from a set of bacterial species (2184 organisms obtained from Genbank v88; National Library of Medicine) having the broadest possible taxonomic distributionare used to design amplification experiments (Saiki et al., Science 230: 1350-1354, 1985).

The total information content at each position is the basis for selecting phylogenetically-informative regions flanked by >18 bp segments showing sufficient sequence conservation to be used as primers for the PCR amplification reaction. Theratio of the number of bits of each nucleotide at each position to the total number of bits at that site may determine the proportion of a particular nucleotide at degenerate sites in the oligonucleotide primer. A ratio of 0.001 may be taken as theminimum proportion required to include this nucleotide in a degenerate site (see below). Otherwise, the primer may be designed to be homogeneous at that position.

FIG. 1 illustrates a typical process for identifying unknown species in a sample. The main (central column) and contingency (left and right columns) procedures are all shown in the figure. The contingency procedure is invoked only if the mainprocedure does not provide the desired results at any given step. The main process may be automated at several steps, as indicated by the boxes with boldtype outlines. A laboratory robot and thermal cycler inside the biosafety cabinet may be used tocarry out these steps. This should maximize safety and minimize errors in handling and tracking infectious specimens.

Briefly, a clinical specimen may be dispensed into a microtiter plate, and the organism(s) contained in the specimen may be killed by heat treatment. The sensitivity of detecting different organisms usually is not compromised by this method forreleasing DNA. The bacterial 16S rDNA sequence may then be amplified, and the product may be purified, preferably by magnetic methods. PCR may be generally performed by following the methodology described by Mullis K B, Faloona F A, "Specific synthesisof DNA in vitro via a polymerase-catalyzed chain reaction," Methods Enzymol. 1987; 155:335-50. The purified PCR product may then be cycle sequenced (using the original amplification primers for sequencing and a set of complimentary sequencing primersderived from a high-information content interval from within the amplification product; these sequencing primers are also designed to be degenerate so as to include the respective combinations of nucleotides present and in the same frequencies that thesenucleotides are observed in the multiple alignment (in this case of 2184 sequences). A contig of redundant, overlapping sequences from the complementary sequences derived from both strands of the product may be constructed from the sequences producedusing the internal sequencing primers and by using the amplification primers to initiate sequencing.

The resultant consensus sequence derived from the contig may then be compared with a database containing a large collection of ribosomal sequences from a large number of organisms. The database comparison may be carried out on a local computeror remotely through a web-based tool. For example, the National Center for Biotechnology Information's (NLM) Basic Local Alignment Search Tool may be used to carry out the sequence comparison to a database of ribosomal DNA gene sequences culled fromGenBank or Ribosomal Gene Database entries.

The sequences showing the greatest similarity (and one distantly related sequence) may be multiply aligned with the test sequence. The Clustal program may be used for the alignment, and several different types of phylogenetic trees may beconstructed demonstrating the taxonomic relationship between the test sequence and those derived from the other organisms.

Oligonucleotide DNA primers may be designed such that both the conserved regions and the divergent DNA sequences between them may be amplified (example shown below). The 3' termini of these primers are preferably selected at positions in thesequence logo with maximal information content (as close to 2 bits as possible), so that the primer end is always complementary to any bacterial template that is found in the series of orthologous genes that is used to design them. The 5' termini andinternal primer positions of each primer are more permissive for degeneracy, with frequencies of each nucleotide corresponding to those given by the information analysis. This design may increase the efficiency of amplification, allowing the orthologous16S rDNA sequences to be obtained from a maximum number of organisms. Oligonucleotide mixtures are defined by the frequencies of each nucleotide in the sequence alignment. This feature may maximize the sensitivity without sacrificing specificity forthe 16S rDNA genes because these oligonucleotides are complementary to most potential genomic targets. A limitation of this method, however, is that omnibus amplification of genes whose functions and sequences are not widely conserved throughout theprokaryotic kingdom cannot be used to identify those species in which they are not found. Therefore, preferred genes used in this invention include those which perform functions that are necessary to provide essential or fundamental cellular functions(e.g. ribosomal genes needed for protein translation, cytochrome P450-related genes, ie. cytochrome b5, needed for oxidative respiration, other housekeeping genes).

FIG. 2 shows an example of a sequence logo constructed for the purpose of designing primers. The sequence logo indicates the sequence conservation among a multiple alignment of ribosomal small subunit 16S rRNA genes of 2251 prokaryotic species. The portion of the logo shown indicates the region used to select primer pair set A indicated in the specification. The height of each stack of letters depicts the overall conservation in bits of information at each position and the relative heights ofeach nucleotide correspond to the percentage of gene sequences at that position containing that nucleotide. The error bars of the computation of average information at each position are indicated at the top of each stack. The locations of the primersused for PCR amplification and DNA sequencing of the amplification products are presented using arrows (orientation corresponds to the strandedness of the sequence) below the logo at the corresponding numbered positions. The primers are named accordingto these coordinates. Note that the antisense primers indicated at positions 1268-1287 were used only for DNA sequencing of the product amplified by primer pair A (positions 931-949 and 1462-1439).

FIG. 3 shows a windowed information plot which indicates the information content across the entire rDNA gene. This figure is color coded by average information content which permits selection of high information content primer length windowsseparated by a low information content sequence region. Additional sets of primers may be designed in a similar fashion based on the information provided in FIG. 3 as well as sequence logo similar to the logo shown in FIG. 2. Table 1 lists three setsof primers designed based on the methodology described above.

PCR products generated from primer set A are approximately 410 nucleotides long, and their DNA sequences may be determined by automated dideoxy methods using internal sequencing primers in less than 1 hour. This procedure simultaneouslysatisfies the requirements for broad specificity and for high sensitivity and does so rapidly (within a few hours) by comparison with conventional microbiological approaches.

If no amplified bacterial DNA is obtained from a human clinical sample using any of the primers for amplifying bacterial DNA, the sample may be subject to PCR amplification using a pair of test human primers. The test human primers may be apair of primers that are capable of amplifying a segment of the human DNA. The test human primers preferably anneal to a segment of the human DNA that is highly conserved. If no PCR products are obtained using the test human primers, it is likely thatPCR inhibitors may be present in the PCR reaction, because human DNA is abundantly present in the human clinical samples. Conversely, if the expected PCR products are obtained using the test human primers, the bacterial primers may not be suitable foramplifying the unknown species, or there may not be sufficient amount of bacteria in the sample.

If it is determined that PCR inhibitors may be present in the PCR reaction and inhibits DNA amplification, the clinical samples may be diluted. For instance, the 50 ul, and 500 ul centrifugation steps in the standard sample preparationdescribed in Example 1 may be used. If these steps do not solve the problem, it may be necessary to dilute the sample to a higher degree than that used in the standard sample preparation in order to dilute out any possible inhibitors of PCR in thesample. PCR inhibition may be tested by using human ribosomal DNA primers in the PCR reaction as illustrated in Example 1.

FIG. 4 shows Omnibus PCR analysis of diluted clinical samples containing infectious agents. Samples 22, 27 and 58 and 70 were diluted 1:5, 1:10, 1:20 and 1:40 and amplified with primer set A which produces a single PCR product of 415 bp. Thehuman genomic DNA control [H] did not yield a band that comigrated with either E. coli (E) or the other bacterial amplification products. Although in this instance all of the dilution produced amplification products, frequently only a subset ofdilutions will yield a result.

FIG. 5 shows results of Omnibus amplification of a representative sample of 23 different organisms. The first 4 organisms were amplified with a different set of omnibus primers from the remaining species (which were amplified using primer setA). Except for T. globrata and C. albicans, which are fungal agents, most of the bacterial organisms produce the expected 415 bp product.

The PCR product may be purified using a number of established methods for DNA purification. Preferably, the PCR product is purified using magnetic separation, or gel purification. Magnetic separation may have a higher yield of recovered PCRproduct than gel purification. At least one primer may be biotinylated if magnetic separation of PCR product is to be used. Although gel purification produces relatively lower yield and requires higher amount of the amplified DNA, gel purification hasproven helpful in some instances in reducing the problem of concatomers of PCR products that may cause difficulty in obtaining clean sequence data. One potential drawback for gel purification is that it may not be as conducive to automation as magneticseparation. Magnetic separation is the preferred method for purifying the PCR products.

The purified PCR products may be characterized by sequencing or other molecular tools. Sequencing methods such as the dideoxy method or the chemical method may be used. See Sanger F, Nicklen S, Coulson A R, "DNA sequencing withchain-terminating inhibitors." Proc Natl Acad Sci USA. 1977 74(12):5463-7; and Maxam A M and Gilbert W, "Sequencing end-labeled DNA with base-specific chemical cleavages." Methods Enzymol. 1980; 65(1):499-560. The sequences may be read from a filmexposed to the sequencing gel or may be obtained using an automated sequencing machine.

Other molecular tools capable of discerning the difference in DNA structures may also be used to characterize the PCR products. See, e.g., V. K. Khanna, "Existing and emerging detection technologies for DNA (Deoxyribonucleic Acid) fingerprinting, sequencing, bio- and analytical chips: A multidisciplinary development unifying molecular biology, chemical and electronics engineering" Biotechnology Advances, 2007, 25:85-98, which is hereby incorporated by reference.

If conventional DNA sequencing is used, when the sequencing run is finished, DNA sequence analysis software, such as Visible Genetics OpenGene, may be used to align and base call the electropherogram. Preferably, the sequence data are manuallychecked and edited to obtain as clean and accurate a sequence as possible prior to sequence analysis. In some situations, the electropherogram may need to be manually aligned and/or base called because of the limitations of the software.

There may be instances where the sequence is not readable because there are multiple peaks at several locations in the sequence. The multiple peaks look a lot like background noise but are usually higher than background noise levels. Thisphenomenon is called "multiple infection" which refers to the presence of more than one sequences in a sample. One way to determine if there is more than one sequence is to run the sequences on atblast which compares the suspected bacterial sequencewith the database of 11,400 prokaryotic 16S rDNA sequences compiled from GenBank using the NCBI Blast software. This method may not work if the sequences can not even be determined in the first place. The sequences derived from most multiple infectionswill only match approximately 10-30 nucleotides (which is not statistically significant or adequate to determine the identity of the species) due to the resulting limited homology to any single organism rDNA sequence. To solve this problem ofdeconvoluting the sequences of multiple organisms, a Constant Denaturing Gel Electrophoresis (CDGE) protocol is developed which separates the amplicons produced by each of the species, so that they can be sequenced independently of one another. CDGE mayallow DNA to be separated on the basis of sequence in a vertical polyacrylamide gel. FIG. 6 shows a typical CDGE gel which separates the omnibus amplification products from multiple species.

Alternatively, mass spectrometric methods may be employed for simultaneous identification of multiple biomolecular targets. When two or more targets are of similar sequence composition or mass, they may be differentiated by using special massmodifying, molecular weight tags on different targets. These mass modifying tags are typically large molecular weight, non-ionic polymers including but not limited to, polyethylene glycols, polyacrylamides and dextrans. These tags are available in manydifferent sizes and weights, and may be attached at one or more different sites on different nucleic acid molecules. Thus similar nucleic acid targets may be differentially tagged and may now be readily differentiated, in the mass spectrum, from oneanother by their distinctly different mass to charge ratios. According to this disclosure, the identification process may be significantly accelerated because multiple species may now be identified simultaneously without separating them first.

The following examples illustrate the present invention. These examples are provided for purposes of illustration only and are not intended to be limiting. The chemicals and other ingredients are presented as typical components or reactants,and various modification may be derived in view of the foregoing disclosure within the scope of the invention.

Example 1

Clinical Sample Preparation

Materials

TE buffer

disinfectant and squirt bottle

empty container for disinfectant

0.2 ml strip tubes

0.2, 0.5, 1.5 ml tubes

pipettes and sterile pipette tips (1000 ul, 200 ul, 20 ul)

Red blood cell lysis buffer (if using samples containing blood)

General Strategy

1) Perform standard sample preparation procedure. 2) If this step doesn't result in an amplification product from the sample containing genomic sequences of the pathogen, it is necessary to concentrate the sample by centrifugation of 50 ul or,if available, 500 ul of the sample at high speed (>13000 g). This concentrates the sample containing the template and increases the chance of obtaining an amplification product. 3) If the sample contains red blood cells (whether or not it is asample of blood), perform sample preparation procedure for blood (#3). Detailed Protocol A. Standard Sample Preparation Dilution:

Put on a disposable lab coat, mask, two sets of gloves and safety glasses

Transfer three 250 ul aliquots of each sample to three labeled 0.5 ml microfuge tubes (to work with)

Flame the opening of the original clinical specimen tube after opening and before closing

Place 20 ul of sample into the 1:6 tube and mix thoroughly take 20 ul of 1:6 and place into 1:20 tube and mix thoroughly take 20 ul of 1:20 and place into 1:50 tube and mix thoroughly take 20 ul of 1:50 and place into 1:100 tube and mixthoroughly recap tubes tightly and place used tip into disinfectant

Place tubes in thermal cycler and perform Hotstart to kill bacteria 94.degree. C. for 15 minutes 4.degree. C. for infinity

Clean and sterilize hood, dispose of all contaminated materials in biological waste container

A sequence logo was created from the aligned 16S rDNA sequences, and a representative region having two conserved regions surrounding a divergent region is shown in FIG. 2. The horizontal axis represents nucleotide positions along the DNA,whereas the vertical axis measures the degree of conservation at the same position in the various species. The vertical scale is given in bits of information (or R.sub.sequence), which measures the number of choices between two equally likelypossibilities. R.sub.sequence may be calculated according to Equation I.

The choice of one base from the 4 possible bases requires two bits of information. The two bits correspond to two choices. For example, the first choice could determine whether the base is a purine or a pyrimidine and the second choice wouldspecify which purine or pyrimidine is present. Thus, if at a certain position, all of the aligned 16S sequences have the same nucleotide, then that position has two bits of conservation. Thus, in the logo of FIG. 2, that nucleotide appears at thatparticular position with a height of (almost) 2 bits. A small sample correction prevents it from being exactly 2 bits high (Schneider T. D. et al., 1986, J. Mol. Biol., 188:415-431).

For those positions where two equally likely bases occur, there is only one bit of information. This is because a choice of 2 things from 4 is equivalent to a choice of 1 thing from 2. By way of example, if at a particular position in anine-sequence alignment, 5 of the sequences contain A and 4 have T, this position is about 1 bit high in FIG. 2. The relative frequency of the bases determines the relative heights of the letters, and since A is more frequent, it is placed on top. Aposition in which all four bases are equally likely is not conserved and so has an R.sub.sequence of zero and its height on the logo is zero. When the frequencies of the bases are other than 0, 50 or 100 percent, the heights still measure theconservation at each position, and the calculation may be performed according to Equation I.

Example 3

Design of Primers

In order to perform a PCR, two segments of DNA (which are referred to as primers) may be designed and prepared. The two PCR primers represent a set of oligomers in which set the frequency of a nucleotide is proportional to its presence at thisparticular position in the sequence logo prepared based on a number of 16S rDNA sequences from different organisms. The primers were designed according to the following three criteria: (1) They are in regions of high conservation, and surround regionsof low conservation. (2) The 3' termini cover regions that are invariant between species, so that the primer end which is extended by the DNA polymerase is always properly annealed to the DNA. (3) The oligonucleotide primers are not self complementaryand do not base pair to each other. The primers may also contain restriction sites useful for subsequent cloning of the amplification product.

The following primers have been designed based on the Logo prepared in Example 2, and the relative positions of Primer Set A is shown in FIG. 2:

wherein the subscripts represent the relative abundance of the corresponding nucleotide at that position in the sequence as determined by information analysis of a large number of multiply-aligned sequences from a wide variety of prokaryoticspecies.

wherein the subscripts represent the relative abundance of the corresponding nucleotide at that position in the sequence as determined by information analysis of a large number of multiply-aligned sequences from a wide variety of prokaryoticspecies.

wherein the subscripts represent the relative abundance of the corresponding nucleotide at that position in the sequence as determined by information analysis of a large number of multiply-aligned sequences from a wide variety of prokaryoticspecies.

Primer sets A, B or C may be used separately to amplify the 16S rDNA sequences of a prokaryotic organism.

The percent denaturant can be varied to maximize the separation of the DNA on the gel. Currently gels containing 30% denaturant are run. Using the temperature controlled, Owl vertical gel electrophoresis unit (Owl Scientific), 75-80 ml ofappropriate percentage gel to cast one CGDE. For this example 30% denaturant will be used. Mix 52.5 ml of 0% denaturant, 22.5 ml of 100% denaturant and 323 .mu.l of 20% APS in a vacuum flask. Remove the air by placing under vacuum for several minutes. After the air is removed add 138 .mu.l of TEMED. Swirl gently and add the gel mixture to the glass plates.

Preparation of the Gel Apparatus and Peristaltic Heating Unit:

The circulating heater bath to control gel box temperature must be turned on- and running through the gel box at 60 C. The plates used for CDGE need to be cleaned with soap and water and also with 95-100% ethanol. Plastic spacers are placedbetween the plates to separate multiple gels. The plates are then placed inside a plastic freezer bag. Cut the top of the freezer bag off so it is approximately the same height as the glass plates, this allows for easier pouring of the gel. Once thisis done place the bag/plate sandwich in the Joey gel casting system (Owl Scientific) and tighten the screws down. Place the comb between the plates making sure that the wells are only about 5 mm below the top of the glass plates. If the wells are toodeep, the samples will disperse during loading, resulting in pour resolution. This setup should be performed before the acrylamide is degassed prior to pouring the gels.

Casting the Gel:

Once the gel casting system is assembled and the gel is ready to pour you will need to get a 1000 .mu.l pipet tip and the 60 CC syringe. Cut about 3 mm off the end of the pipet tip with a scalpel and attach to the syringe. Pour the gel fromthe vacuum flask into an appropriately sized beaker (so you can fit the syringe into the gel). Suck up the gel with the syringe and add between the glass plates. The space between the plates filling with the gel should be visible. Fill the plates tothe top with the gel. Once the gel is poured, wait 15-30 min. until it has polymerized.

Final Gel Preparation:

After the gel has polymerized remove it from the gel casting system and the plastic bag. Scrape any excess polyacrylamide from the outside of the plates. Wiping the plates with a damp paper towel after scraping the excess off helps remove anyresidual polyacrylamide. Place the plate/gel sandwich in the vertical gel box that has the 60 degree water running through it (you should have turned the heater for the water on earlier--make sure to check the volume of water in the heater). Place1.times.TBE buffer in the bottom of the gel box and between the gel and the vertical part of the box. Unless two gels are being run, the other side of the gel box should have a plastic plate in it so the buffer can fill the vertical part of the chamberand cover the platinum wire that generates the current. Remove the comb and straighten the polyacrylamide between the gels for easier loading. The gel can now be loaded.

Sample Preparation, Loading and Electrophoresis:

The samples used for CDGE should come from 100 ul PCR reactions so there is enough DNA to detect by CDGE. Make sure to do an agarose gel check to verify that sufficient amplification product is available before running the sample on a CDGE. Ifthere is ample DNA add 20-25 .mu.l of CDGE loading dye (contains 50% formamide) to the sample, place in thermocycler and run CDGE3 program. Once the program is finished, add .about.20 .mu.l of sample to the wells and run the gel. The gel is run at 60mA for 4-4.5 hours. Once the gel is done running remove it from the gel box and remove one of the glass plates. Make sure you mark the gel so you know the orientation. Remove the plastic spacers and plate the plate with the gel inside a plastic bagand in one of the gel dying trays.

Staining the Gel and Photographic Exposure:

Once the gel is in the bag in the dying tray add 300 ml of dye (Gelstar SYBER Green--2.times.). Placing the gel in the bag allows more of the dye to get on the gel (you can fold the bag up to move more dye onto the gel). Cover the gel/gel traywith aluminum foil and stain gel for 30 minutes. After the gel is stained, remove it from the bag and place on seran wrap (the stain can be dumped from the bag and saved-dye is good for several days if kept in fridge and covered). Use the seran wrap tocover the gel and flip it so the gel can be removed from the glass plate. Once the plate is removed place the gel on the UV-light box and look at it. Photograph the gel with the electronic digital camera and cut out bands and elute the DNA for DNAsequencing.

Example 8

Obtaining and Editing Sequencing Results

Editing an Automatically Aligned and Base Called Electropherogram:

Click on the sample to be edited to bring up the curve viewer of sequence data for that sample. Smaller regions of the electropherogram can be viewed and edited by shortening and moving the zoombox above the sequence data.

First view the signal strength of the region between the primer peak and the end of sequence peak (this is the region of our target DNA sequence). The peak heights should be at least 1,000 as indicated on the left of the sequence under theviewing options. If signal strength is too weak, clean sequence data cannot be achieved. Also, if the peak heights of the sequence data are too high (above the threshold of the software), the peaks will flatten on the top and clean sequence data cannotbe achieved. Optimal peak height is approx. 1,000 to 4,000, however, it may be possible to achieve clean sequence outside this range.

If the electropherogram has been automatically aligned and base called, check the run overview. The less black and grey sections on the bottom portion of the run overview, the better the sequence quality.

Scroll through the sequence to visually check the accuracy of the automated alignment and base calling. If the electropherogram appears to be aligned and base called accurately, continue with the procedure below. If not, the electropherogrammay need to be manually aligned and/or base called (see section below "Manually aligning and base calling and electropherogram").

OPTION: Under TOOLS, BASE CALLING, ATTRIBUTES, the heterozygote stringency can be changed to achieve more accurate sequence results. PURE (.about.50%) has proven to work well in the past. If you change the heterozygotes function, re-base callthe sequence by selecting BASECALL under MANUAL and click on GO.

Start from the primer peak region and scroll through slowly. Make any adjustments to the electropherogram that are necessary (the software will occasionally miscall bases). Add a base by clicking on the location the base belongs and typing theappropriate letter. Delete a base or bases by highlighting the region containing the base/bases to be deleted and pressing the backspace button. If a region of the electropherogram is unclear as to what the sequence should be, insert an appropriateamount of n's in this region.

Make sure to delete the bases called, if any, in the primer peak region, end of sequence peak region, and any other region of the electropherogram not containing the target DNA sequence. It may also be necessary to delete up to the first 20bases past the primer peak and before the end of sequence peak if these regions of the electropherogram are poorly aligned and/or base called. Try to keep as much sequence data as possible while maintaining a high degree of accuracy.

Save the electropherogram. This file is now ready for sequence analysis.

Manually Aligning and Base Calling an Electropherogram:

If the electropherogram is aligned well but not base called, manually base call the electropherogram by clicking on each peak and typing the appropriate base. Insert Ns where the sequence is unclear.

If the electropherogram is poorly aligned: 1) Check the quality of sequence data by scrolling through the electropherogram. The tiled function under the viewing options in the curve viewer can be selected to view each lane separately. If thebackground noise is high or if the peaks on the electropherogram are not clear and distinct from each other, then clean sequence data cannot be achieved from this electropherogram. If the peaks for each lane appear clean and distinct from each otherwith low background noise, then continue to step 2 below. 2) You can try adjusting the peak distance (under TOOLS, BASE CALLING, ATTRIBUTES). The software attempts to automatically align the electropherogram at a default setting of 8.00 for peakdistance. The actual peak distance will be most likely be somewhere between 5 and 8. You can attempt to adjust the peak distance and select ALIGN, under MANUAL and click on GO to re-align the electropherogram. By trial and error, this may produce anaccurate alignment. If not, continue to step 3 below. 3) Manually align the sequence: Under MANUAL, click on RESET TO RAW Start at end of sequence peak. Under ALIGN POINTS, ADD and align point to the end of sequence peak. Click on SHOW under ALIGNPOINTS. Click on the align point you just added. Adjust the four lanes using the arrows under manual alignment until the end of sequence peak is aligned in all four lanes. Add another align point about 20 nucleotides before the end of sequence peak,click on it. Adjust the four lanes (in the same manner as before) until they are aligned between the two align points. Add another align point about 20 nucleotides before the last one and align the lanes as above. Continue this until the entiresequence is aligned.

Manual alignment takes practice and requires trial and error. A user may need to start at a different place in the electropherogram or use different methods from those listed above. Even if peaks for each lane on the electropherogram appearclean and distinct with low background noise, alignment may be impossible due to a multiple infection creating peaks in more than one lane in the same location.

If alignment of the electropherogram is achieved, the electropherogram may or may not be able to be automatically base called. To attempt automatic base calling, select BASECALL under MANUAL and click on GO.

If the sequence doesn't base call, manually base call by clicking on each peak and typing the appropriate letter.

If a sequence is achieved, save the electropherogram. This file is now ready for sequence analysis.

Example 9

Sequence Analysis

After a "clean" assay sequence is achieved, it is then assembled into a contig by comparison with other "clean" overlapping and complementary sequences from the same specimen, and the consensus sequence is derived from the contig sequence. Thesequence analysis containing this sequence is performed to identify the organism based on the rDNA sequence. The consensus sequence (or the assay sequence) is then compared to a database of approx. 11,400 prokaryotic 16S rDNA sequences. A quicksequence analysis can be performed by running atblast (a software script which compares the test sequence with this database using the NCBI Blast engine) on the Sun 5 scientific workstation. Atblast will display the best 50 matches and pairwisealignments from a blast search comparing our sequence with the sequence database. An, in depth, comprehensive analysis can be performed by running atblasttest on the Sun workstation. This program will perform a Blast search as in atblast, followed by amultiple sequence analysis to relates the consensus or assay sequences to the most closely related organisms demonstrated by the Blast search, and then computes and displays two different types of phylogenetic trees based on the relationships betweenthese closely related, multiply-aligned sequences (Parsimony and Neighbor-joining trees). Atblasttest will save all of the relevant files for each analysis under a time-date stamped folder for each assay or consensus sequence file entered.

Running the Atblast Script

On the Sun workstation, open a terminal window and enter atblast at the prompt.

On the Visible Genetics' computer, make sure ShiptoSun folder (Users> Lab> Data> LAB> ShiptoSun) contains no assay files. The ShiptoSun script looks at this folder and exports the results of the sequence analysis to the Sunworkstation where the atblast analysis commences. If it does, move these files back to the appropriate folders.

Place the assay file containing the sequence to be analyzed in the ShiptoSun folder.

Open the terminal shell on the Visible Genetics computer.

At the visgen1> prompt, type ftpSun

Within 20 seconds, the output from the atblast file will be displayed on the Sun workstation.

Compare the atblast results with the electropherogram for the assay file you just sent to the Sun workstation. Look for differences between the reference sequence and your sequence in the atblast output. Check the locations of thesedifferences on the electropherogram on the Visible Genetics computer and make any changes that are necessary. NOTE: Only make changes on the electropherogram in locations where the sequence is clearly incorrect.

If changes are made to the assay file based on the atblast results, save the assay file. NOTE: The assay file will be saved in its original location and not in the ShiptoSun folder, so if the updated assay file is to be sent back to the Sunworkstation for further analysis, it must be put into the ShiptoSun folder from its original location.

Compare the multiple sequence alignment to your original electropherogram. Look for positions where differences occur in the sequences shown in the multiple sequence alignment. Compare these positions to you sample's sequence andelectropherogram. Make any changes to your sample's electropherogram that seem apparent. If any changes are made, perform atblasttest again.

Observe the phylogenetic tree files to see the phylogenetic relationship between your sample sequence and closely related sample sequences. The two tree files created may show a different phylogenetic relationship because of different treeforming algorithms used in each.

To close and save the windows generated by atblasttest (must be done before another atblasttest analysis is performed): 1) Under the ClustalX window, write the alignment as postscript (under FILE), then quit (under FILE). A postscript multiplesequence analysis will then be shown, this can be printed or closed. 2) Under the Treetool: newseqtree.ph window, under FILE with the right mouse button, select quit. 3) Under the newseqtree.ph window, click on SAVE TREE, then close. 4) Under thePreliminary Sequence Analysis Report, close window.

The terminal window will indicate the filename under which the atblasttest files for that sample sequence are located.

Example 10

Feasibility Study

In order to determine whether the primers and the methods of the present disclosure work across a broad spectrum of species, purified organisms were obtained and cultured in a laboratory setting. Genomic DNAs were extracted from these culturedorganisms and used as template for PCR using Primer sets A, B and C. The results of the PCR reactions are summarized in Tables 3 and 4. The PCR products shown in Table 4 were also subject to sequencing using primer A which results are shown in the sameTable.

Clinical samples containing microorganisms that have been identified were processed and analyzed according to the procedure disclosed in Examples 1-9. The main results obtained using the presently disclosed methods and those obtained usingconventional microbiological methods are compared as shown in Table 5.

The results in Table 5 show that in vast majority of the cases, identification using the methods and the primer sets disclosed here produce the same results as those identified using conventional culturing methods. Of the 299 total clinicalsamples tested, 195 samples produced either positive or negative amplification results that are in accord with the results obtained by culturing method. Of the 145 positive amplification products sequenced, 131 produced acceptable sequence results. 114of the 131 sequence results identified the same organism(s) as the culturing method, with one result generating more specific identification of the organism than the traditional culturing method. 4 out of 131 produced identification results that are indiscordance with the results produced by culturing. An additional 12 sequencing results appeared to contain multiple sequences likely as a result of multiple organisms in the clinical sample.

>

6rtificialSyntheticdegenerate oligonucleotide derived from analysis of information from a multi-sequence alignment of ribosomal RNA sequences. Degenerate positions are represented by symbols other than a, t, g and c. gcng ynbyggt AArtificialSyntheticdegenerate oligonucleotide derived from analysis of information from a multi-sequence alignment of ribosomal RNA sequences. Degenerate positions are represented by symbols other than a, t, g and c. 2ccmdhcwath hnytkvrstw k 2ArtificialSyntheticdegenerate oligonucleotide derived from analysis of information from a multi-sequence alignment of ribosomal RNA sequences. Degenerate positions are represented by symbols other than a, t, g and c. 3nbkhhwrbbn nnnaacga AArtificialSyntheticdegenerate oligonucleotide derived from analysis of information from a multi-sequence alignment of ribosomal RNA sequences. Degenerate positions are represented by symbols other than a, t, g and c. 4cahdgyadbn yghktshncc c 2ArtificialSyntheticdegenerate oligonucleotide derived from analysis of information from a multi-sequence alignment of ribosomal RNA sequences. Degenerate positions are represented by symbols other than a, t, g and c. 5nbkhhwrbbn nnnaacga AArtificialSyntheticdegenerate oligonucleotide derived from analysis of information from a multi-sequence alignment of ribosomal RNA sequences. Degenerate positions are represented by symbols other than a, t, g and c. 6dbbvhvgrss vgtkwgtrca 2BR>