HIV Genotyping

HIV Genotyping

1. Summary

Untreated infection with the Human Immunodeficiency Virus leads to AIDS, a deadly and widespread disease. Though many effective drugs are available, the virus mutates rapidly, and drug resistance evolves quickly. In this exercise, you will use simulated DNA sequencing to determine the differences from the standard using Staden's trace-subtraction function. This list of mutations will let you query the online Staford HIV Database to predict the drug resistance and susceptability of this viral strain. Building on the earlier shotgun sequencing exercise, we perform 'resequencing' by aligning reads from clinical samples against a standard sequence. We also take advantage of the Staden sequence data management software to perform trace subtraction to detect variations from the standard.

2. Learning Objectives

After preparing for and completing this exercise, you should be able to:

List the experimental parameters which affect the quality of a sequencing reaction.

Describe the causes of failure of sequencing reactions by relating reaction conditions, primer design and template quality to the trace output from the virtual sequencer.

Describe the primary functions of pregap4 and gap4 in the analysis of sequence data

Explaine the concept of trace subtraction and it's application to HIV genotyping.

State the differences between mutations that cause drug resistance, those that cause codon changes in the protein sequence and silent mutations which do not cause changes in the protein sequences of HIV protease and RT.

Explain how the Stanford HIV drug resistance database is used.

3. Background

Drug therapies have been developed for many infectious diseases. Antibiotics, drugs that treat bacterial infections, historically have been discovered by testing substances for their ability to kill or inhibit growth of bacteria in culture. Substances active in culture are further tested for efficacy and safety in animals, and eventually humans.

Developing drugs to treat viral infections is more complicated, in part because of the difficulty in culuring many viral species. A targeted approach based on understanding the biochemical events and pathways of the virus life cycle has proved fruitful in many cases. Microbial enzymes requred by the infectious agent, but not related to enzymes of the host, are typocally chosen as targets, in the hope that substances can be found that inhibit the micribial enzyme without affecting normal functioning of the infected cells.

HIV is the causal agent for Acquired Immunodeficiency Syndrome (AIDS). HIV is a retrovirus; its genome is made of RNA which is copied into DNA for replication and integration into the host genome. The enzymes which catalyzes the copying of RNA sequence into DNA, reverse transcriptase, has no analogue in eukaryotic cells, and was one of the first viral enzymes to be targeted in fighting this disease.

Two main classes of reverse transciptase inhibitors have been developed. Some target the enzyme's active site, where new nucleotides are added to the growing DNA strand; these are mosly chemical analogs of nucleosides, such as Zidovudine. Other drugs target a hydrophobic pocket on the reverse transcriptase; these include most of the non-nucleoside drugs.

Another unique HIV enzyme which has proved to be a useful drug target is the HIV protease, which is involved in actvating several viral proteins by cleaving them off of larger precursor proteins. Other viral enzymes have been targeted more recently.

HIV therapy is complicated by the fact that the virus evolves very quickly, and resistence to new drugs develops with alarming rapidity. Because the drug targets are known, it is often possible to determine the exact nucleotide substitutions in the resistant viruses that give them the abiliy to withstand the drug. In fact, so much information has been gathered over the years about drug resistance mutations that it is now generally possible to examine the seqeunce of a particular virus and determine the particular antiretroviral agents that would be most effective in treating it.

One of the challenges of nucleotide sequencing is assembling the realtively short and somewhat ambiguous sequence segments produced by the experimental procedure into longer segments of statistically reliable sequence data. This process is greatly facilitated if we know the general structure of the target. We will take advantage of having a known, standard HIV genome to compare our mutants against as we collect and organize our data.

In this exercise, we will use simulated DNA resequencing to determine the genotype of a given virus, and use that genotype information to determine the drug resistance profile of the virus.

4. Clinical Scenario

Mr. B is a 42 year old man who has been undergoing treatment for HIV infection for the past two years. He has had difficulty adhering to a multidrug cocktail of HIV protease and reverse transcriptase inhibitors for treatment of his infection. At a recent doctor's visit, he had his blood tested and the doctor reported that his T4 cell count had dropped significantly and his viral load had become elevated since his last visit.

Mr B then began to get more serious about taking his medicine. Unfortunately, at his next doctor's visit, his T4 count was still low and his viral load continued to be high. The doctor then recommended an HIV genotyping test to see if he had developed drug resistance.

The goal of the laboratory is to answer the following questions:

Based on the results of the genotyping test, which mutations are present in the virus infecting Mr. B?

Based on the Stanford HIV database report, which drugs had Mr. B become resistant to?

5. Resources

This page contails the experimental protocol, as well as several resource tables. If your computer monitor is small, you may want to print this page to avoid having to flip back and forth between browser pages and programs.

This PowerPoint presentation reviews HIV biology and issues related to sequencing. It contains additional relevant scientific background for performing the exercise, and is suitable for a class lecture of approximately 30 minutes.

The trace subtraction mutation detection technique is described on the Staden website.

The pregap4 configuration file and HXB2 reference sequence can be downloaded in this zip archive.

6. Experimental Procedure

1. Install the Staden Package

The Staden package is sequence analysis software which allows the user to view, process and assemble sequence the chromatograms generated by the Cybertory trace generator. This software package must be downloaded and installed prior to analyzing sequence data generated by the Cybertory trace generator. To install the Staden package, follow the instructions:

Under the Staden 1.5.3 section, choose the platform (e.g. staden-macosx-1-5-3.tar.gz for an apple computer or staden-windows-1.5.3.msi for a PC) and select it with the mouse. A page will appear with locations to download the software.

Choose one of the download locations and double click to download the Staden software installer.

Once the file has been downloaded, simply double click on the installer and follow the installation instructions.

The Staden Package will install as a folder of applications and a tutorial in the folder where the Staden installer has been saved.

2. Download the HIV_genotyping Folder

The HIV genotyping folder contains an HXB2 genomic reference sequence and a Staden configuration file which is needed to perform the exercise.

Download the zip archive, and save it to your desktop as "HIV_genotyping.zip”.

Unzip the archive to create the HIV_genotyping folder on your desktop.

3. Assign Viral "Unknowns"

The instructor should assign each student or pair of students working at a computer an HIV clone from the list of clones in the Treatment Histories table.

Each HIV clone has a drug treatment history and associated drug resistance mutations. The clones have been constructed from the complete HXB2 genome (accession #NC_001802) by adding resistance mutations into the HIV protease and RT coding regions based on actual sequence data files from the Stanford HIV database. The treatment history of antiretroviral drugs for each clone are listed in the table.

4. Sequence HIV Protease

Each student or student pair should have a viral "unknown" assigned to them. It is recommended that the instructor walk the students through the sequencing of HIV protease step-by-step using a computer and overhead projection for the first part of the exercise, if possible.

The primers proF and proR sequence the entire HIV protease coding region of 99 amino acids, on both strands, as shown in Figure 1. ProF is the forward sequencing primer and ProR is the reverse sequencing primer. Follow the instructions to sequence HIV protease for the HXB2 reference sequence using the Cybertory trace generator.

Type in a name in the "User Name" box. For example, use your first or last name or email address (without the @ symbol).

From the "Template" pulldown menu, select "HXB2", which is at the bottom of the pulldown list. The list also contains all of the unknowns, which will be sequenced later.

The primer sequences are available at the bottom of the trace generator webpage and are summarized in Table 1. Copy and paste the proF sequence "AGCCAACAGCCCCACCAG" into the "Primer Sequence" box of the trace generator.

Select the "Primer Strand" pulldown menu to forward, as proF is a forward sequencing primer.

Leave all of the other fields at their defaults and double-click on the "Run Sequencing Reaction" button.

A window will pop up and ask you to save the file. Select "Save"

A second window will pop up. Go to the "file name" box, second to the bottom of the window. Change the name of the file from "HXB2-p1t.scf" to "HXB2-proF.scf".

Save the file in the "HIV_genotyping" folder on your desktop.

Go back to the trace generator webpage to sequence HXB2 with the reverse sequencing primer proR. This will give you sequence coverage on both strands.

Copy and paste the proR sequence "GGGCCATCCATTCCT" into the "Primer Sequence" box of the trace generator.

Select the "Primer Strand" pulldown menu to reverse as proR is a reverse sequencing primer.

Select the "Annealing Temperature" pulldown menu for this primer to 65 instead of 55 for best results. If 55 degrees is chosen, the sequence will be of poor quality.

Leave all of the other fields at their defaults and double-click on the "Run Sequencing Reaction" button.

A window will pop up and ask you to save the file. Select "Save"

A second window will pop up. Go to the "file name" box, second to the bottom of the window. Change the name of the file from "HXB2-q1t.scf" to "HXB2-proR.scf".

Save the file in the "HIV_genotyping" folder on your desktop.

Repeat the entire process for your assigned "unknown" for both proF and proR sequencing primers. The unknown is selected in the "Template" pulldown menu instead of HXB2.

5. Assemble Sequence and Detect Mutations

We summarize the function and use of each Staden module used in the exercise, including step-by-step instructions on how to use each module.

6. Trev

The trev (trace viewer) application is used for visual inspection of trace files generated by the cybertory trace generator. It is important to examine the quality of the sequence traces before proceeding in order to see that they are of good quality. There should be a single major peak at each position in the trace file.

Launch the Trev application by going to the "Start" Menu, select "Programs", "Staden Package" and finally "Trev". This will launch the Trev application.

Select the "File" pulldown menu to "Open" to load the trace file from the "HIV_genotyping folder on the desktop. Please note that the "files of type" dialog box at the bottom of the screen must be selected to ".scf" files to view the traces produced by the trace generator. It is set to ".abi" file type by default, and the ".scf" files will not be listed unless the file type is changed.

On the top of the sequence traces is a set of arrows and a scroll box that can be used to view the entire trace file. The student is then able to decide if the sequence trace is of sufficient quality; if not they can resequence the template after altering the parameters in the trace generator.

7. Pregap4

The pregap4 module processes the .scf files created by the trace generator according to a set of parameters which have been selected and defined by a configuration file for the HIV genotyping exercise. Some of the functions of pregap4 are to edit poor quality sequence and vector sequences in the trace file so they will be excluded in the assembly in gap4.

Go to the "HIV_genotyping" folder on the desktop and double-click on the configuration file pg4.config. This launches the pregap4 utility with the proper settings for the exercise.

Select the "Files to Process" tab at the top of the screen and then select the "Add files" tab. The bottom "Files of type" pulldown menu needs to be .scf instead of .abi.

From the HIV-genotyping folder on the desktop, select the four sequence trace files for HIV protease which were sequenced using the trace generator; HXB2-proF, HXB2-proR, clone#-proF and clone#-proR into pregap4.

Click on the "Open" button to load the files into pregap4. The files should appear in the main window.

Select the "Add files" tab again. The bottom "Files of type" pulldown menu needs to be the last option "any". Select the HXB2.embl reference sequence.

Click on the "Open" button to load the file into pregap4.

At the top of the pregap4 screen, select the "Configure Modules" tab. Select the "reference traces and sequences" box.

To the right of "Reference Trace{+ve strand}" click on the Browse button. Go to the HIV-genotyping folder on the desktop and select "HXB2-proF" as the forward reference trace.

To the right of the "Reference Trace{-ve strand}" click on the Browse button. Go to the HIV-genotyping folder on the desktop and select "HXB2-proR" as the reverse reference trace.

To the right of "Reference Sequence" click on the Browse button and select "HXB2.embl" as the reference sequence.

Process the traces by clicking the "run" button on the lower left hand side of the screen.

8. Gap4

This software allows the student to view the sequence trace files in an assembly format. A gap4 database is created after processing the trace files in pregap4 as a ".aux" file.

In the "HIV-genotyping" folder on the desktop, locate the "HIV.0.aux file. Double-click on the "HIV.0.aux" file to launch gap4.

In the main gap4 window the assembly can be viewed by selecting the "View" pulldown menu, select the "Templates Display" menu option.

In the "Show Templates" dialog box select the "all contigs" button and check the "Templates and Readings" boxes. Four sequencing reads (two forward and two reverse) corresponding to the HXB2 control and clone primer sequences are visible in the display window of gap4. The sequence reads should automatically be correctly oriented to the HXB2 reference sequence (HXB2.embl). Orange dots denote differences in sequence between the HXB2 control traces and the clone traces.

Double-click on the arrows in the template display to bring up the "Contig Editor" window.

In the "Contig Editor" window go to the "Settings" menu and select "trace display" and "autodiff traces".

On the left margin of the contig editor window, select the name of the HXB2 reference sequence (HXB2.embl) by double-clicking on it. Right-click to bring up a context menu and select "set as reference sequence." Leave the default options of "first base number" of 1 and "No" for circular sequence and click "ok".

An "S" should appear at the far left end of the HXB2 line, denoting HXB2.embl as the reference sequence.

To visualize the result of the trace subtraction algorithm, double click on one of the sequence reads in the contig editor (either clone sequence proF or R). Three sets of traces should appear; the HXB2 control trace, the clone trace and a trace subtraction.

The trace subtraction will show all of the differences between the clone trace and the HXB2 control trace. These differences correspond to the drug resistance mutations in the clone sequence.

To export a list of the mutations as a report, select "Commands" menu and then "Report mutations" from the contig editor’s menu. A plain text version of the mutation report will appear in the gap4 main window. Copy and paste the mutation report from the gap4 main window into a text file.

Save the text file as a word document in your desktop folder "HIV_genotyping" as "pro_mutations"

9. Submit Mutations to the Stanford HIV Database

We have written a utility that converts the mutation list from gap4 to a format which can be entered into the Stanford HIV database; it is available under the HIV Genotyping section at http://www.cybertory.org/exercises/.

Copy and paste the mutation report from gap4 into the window of the utility and click on the "Reformat for HIVdb" button. A list of amino acid changes will be produced, which can be copied and pasted into the Stanford HIV database to determine the drug resistance profile of the clone.

The amino acid changes are entered into the sequence analysis section of the Stanford HIV Database (HIVdb).

Copy and paste the list of amino acid changes into the menu for HIV protease.

Click the "analyze" button and the database will generate a mutation report describing sensitivity or resistance to various HIV drugs

10. Sequence Reverse Transcriptase

The HIV reverse transcriptase (RT) enzyme is larger than HIV protease. RT must be sequenced by three pairs of primers (rt1Fand R, rt2F and R, and rt3F and R) which cover the RT coding region up to codon 333 (Fig. 1). Each primer pair for rt1, 2 and 3 is analyzed separately, using the method described above for sequencing HIV protease. Sequence the clone using the first primer pair, rt1F and R instead of proF and proR, following the instructions with the following exceptions:

All RT primers are sequenced at 55 degrees and follow the naming conventions as described for sequencing HIV protease, for example, HXB2-rt1F, HXB2- rt1R, clone#-rt1F, clone#-rt1R, etc.

11. Pregap4

Follow steps 1-11 of the pregap4 instructions substituting HXB2-rt1F, HXB2- rt1R, clone#-rt1F, clone#-rt1R for HIV protease. To analyse sequence data from each RT primer pair, create a new gap4 database for each pair.

In the "Configure Modules" section after completing pregap4 setup, select the "Gap4 shotgun assembly" option with the mouse. On the right hand side of the window in the Gap4 database name box type rt1. Select the "create new database" option directly below. Leave the Gap4 database version set to 0. For rt2 and rt3 primer pairs, do a separate pregap4 analysis for each primer pair and name each gap4 database rt2 andrt3, respectively.

12. Gap4

Follow the instructions for gap4 analysis. In step 11, for each RT primer pair, save the mutation list as "rt1_mutation" for the rt1F and R gap4 analysis; for rt2F and rt2R, "rt2_mutation" and for rt3F and rt3R, "rt3_mutation". Remember to do a separate pregap4 and gap4 analysis for each primer pair, creating separate gap4 databases rt2 and rt3 for each respective primer pair.

13. Submit Reverse Transcriptase Mutations to the HIV Database

The mutation list from all the primer pairs are copied and pasted together into a single text file to be submitted to the Stanford HIV database to obtain a resistance report.

Follow the instructions for submitting mutations to the Stanford HIV database. Be sure to copy and paste the HIV reverse transcriptase mutations in the HIV RT section of the webpage.

Sequence reads from forward and reverse primers for each segment (pro, rt1, rt2, and rt3) are shown in this diagram taken from a Gap4 assembly of the reads against the HXB2 reference sequence. The open reading frame for the pol gene, which is processed to produce the protease and polymerase proteins, is shown by the green box.

8. Treatment Histories

HXB2

NONE

clone01

IDV

clone02

IDV/AZT

clone03

IDV

clone04

IDV

clone05

NFV

clone06

NFV/AZT + 3TC

clone07

NFV

clone08

NFV

clone09

RTV

clone10

RTV

clone11

RTV

clone12

RTV/AZT + 3TC + DDC

clone13

ATV

clone14

ATV

clone15

ATV/AZT + 3TC + DDC +DDI + D4T

clone16

ATV

clone17

APV

clone18

APV

clone19

APV

clone20

APV/AZT + 3TC + DDC +DDI + D4T

clone21

SQV

clone22

SQV

clone23

SQV

clone24

SQV/AZT + 3TC +DDC +DDI + D4T +ABC + EFV + NVP

clone25

IDV, NFV

clone26

IDV, NFV

clone27

IDV, NFV/AZT + 3TC +DDC +DDI + D4T +ABC + DLV + NVP

clone28

IDV, RTV

clone29

IDV, RTV

clone30

IDV, RTV/AZT + 3TC +DDC +DDI + D4T +ABC + EFV

clone31

IDV, NFV, RTV,SQV

clone32

IDV, NFV, RTV,SQV/AZT + 3TC +DDC +DDI + D4T +ABC + EFV + NVP

clone33

IDV, NFV, RTV,SQV, APV

clone34

IDV, NFV, RTV,SQV, APV/AZT + 3TC +DDC +DDI + D4T +ABC + DLV + NVP

clone35

IDV, NFV, RTV,SQV, APV

clone36

AZT

clone37

AZT

clone38

AZT

clone39

AZT + 3TC

clone40

AZT + 3TC

clone41

AZT + 3TC

clone42

AZT + 3TC +DDC

clone43

AZT + 3TC +DDC

clone44

AZT + 3TC +DDC

clone45

AZT + 3TC +DDC +DDI + D4T

clone46

AZT + 3TC +DDC +DDI + D4T

clone47

AZT + 3TC +DDC +DDI + D4T

clone48

AZT + 3TC +DDC +DDI + D4T +ABC + EFV + NVP

clone49

AZT + 3TC +DDC +DDI + D4T +ABC + DLV + NVP

clone50

AZT + 3TC +DDC +DDI + D4T +ABC + EFV

clone51

AZT + 3TC +DDC +DDI + D4T +ABC + NVP

9. Review Questions

List the experimental parameters that affect the quality of a sequencing reaction.

Describe the causes of failure of sequencing reactions by relating reaction conditions, primer design and template quality to the trace output from the virtual sequencer.

Describe the primary functions of pregap4 and gap4 in the analysis of sequence data.

Explain the concept of trace subtraction and it's application to HIV genotyping.

State the difference between mutations that cause drug resistance, those that cause codon changes in the protein sequence and silent mutations which do not cause changes in the protein sequences of HIV protease or RT.