You are here

Sandra Porter, Austin Community College and Digital World Biology LLC, Seattle, WA

Abstract:

The most engaging student research projects combine relevant topics with the opportunity to develop and practice skills that can be applied outside of class. Cases of emerging viruses and outbreaks of infectious agents capture student attention. The Zika virus outbreak has been devastating to young families in Latin America and the Caribbean. Pictures of affected babies with unusually small heads and severe birth defects have alarmed people throughout the western hemisphere. Although few cases of Zika virus infection have been reported in the United States, the mosquito that carries Zika virus, Aedes aegypti, is endemic in the Southern U.S., and could transmit Zika virus in states like Florida, and Texas. In this project, students use the bioinformatics and molecular modeling skills they’re learning in class to identify drugs that might be able to treat people infected by Zika virus.

Introduction

I teach an on-line bioinformatics course through Austin Community College that focuses on common applications for bioinformatics tools in biotechnology. Students learn how to search databases, use molecular models, analyze molecular sequences, and characterize genetic variation. Although bioinformatics provides us with interesting real-life examples to study, our abilities are limited when it comes to finding student projects that include opportunities to make new discoveries.

Last spring, I developed a research project designed to engage students by having them apply their skills and knowledge to the problem of finding drugs to treat Zika virus. It seemed that the devastating effects of Zika infection combined with the possibility (albeit low) that students taking our class might be impacted, would have a positive effect on student engagement.

Students begin the project by looking at Health Map to see where Zika virus has been reported and read about the outcomes of Zika virus infections. Through their reading, they learn that the NCBI structure database contains structures of proteins bound to inhibitors and drugs and that it might be possible to repurpose an existing drug to treat Zika virus. Students read about the Zika virus genome and life cycle and decide whether to look for a drug that inhibits the RNA polymerase or the protease. They find the protein sequence for one of these proteins and use it to search the structure database to find similar proteins that are bound to drugs / inhibitors. After that, they use the Molecule World™ iPad app (1) to identify amino acids that contact the drug and determine if those same residues are present in the Zika virus protein.

In developing an example to show students how this process would work, I determined that the Zika virus polymerase would be likely to bind to Gilead’s anti-Hepatitis C drug Sovaldi® (Sofosbuvir). This prediction was borne out by two recently published studies showing that Sofosbuvir does inhibit Zika virus replication (2, 3).

This general process for repurposing existing drugs is summarized below. Using this process in a class-based research project provides a new avenue for students to apply bioinformatics tools to real-life problems and potentially, make discoveries.

The method for identifying potential drugs

1. Decide on a drug target.

2. Find the amino acid sequence for the protein you wish to target.

3. Use protein blast to query the Protein Database Proteins (pdb) database at the NCBI.

4. Scan the results to find proteins bound to drugs (or inhibitors).

5. Use Molecule World™ to identify amino acids in the drug binding site.

6. Determine if amino acids in the drug binding site are present in both proteins.

How do we apply this method?

Students in our bioinformatics course have used this process for the past three semesters. In this section, I describe the process in more detail and present an example that I developed with sofosbuvir (Sovaldi®).

1. Decide on a drug target.

The first step is to decide which protein to target. Although many different proteins can be used as drug targets, our students choose between two kinds of proteins: the viral RNA polymerase and viral protease. Polymerases play an important role in the life cycle of a virus by making copies of the viral genome. These copies get packaged into new viral particles that can go on and infect new cells. Proteases play an important role in viruses that make polyproteins. These polyproteins must be cut into smaller parts by a protease in order to function. Both polymerase and protease inhibitors are used to treat infections with HIV. Sofosbuvir, from Gilead, is an example of an FDA-approved drug that inhibits the RNA polymerase from Hepatitis C virus.

2. Find the amino acid sequence for the protein you wish to target.

The NCBI (National Center for Biotechnology Information) was established in 1988 as a division of the National Library of Medicine, at the National Institutes of Health (NIH). Their overall mission is to create systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics and to facilitate the use of such databases and software by the research and medical community (http://ncbi.nlm.nih.gov). As part of this mission, the NCBI has set up resource pages for specific viral genomes, in particular pathogenic viruses such as Zika virus, MERs, and Ebola (Fig1 https://www.ncbi.nlm.nih.gov/genome/viruses/)

The Zika Virus Resource page has links to Health Map, the CDC (Centers for Disease Control), WHO (the World Health Organization), and publications. From here, we use the “NCBI Zika virus reference genome” link followed by the GenBank record link to locate the accession number for protein sequences from either the protease or the Zika virus RNA polymerase (Fig 3).

3. Use protein blast to query the Protein Database Proteins (pdb) database at the NCBI.

The accession number for the RNA polymerase is YP_009227205.1. Accession numbers act like catalog numbers. They allow us to find specific sequences in a database. Clicking the linked accession number from the GenBank record takes us to the protein record (Fig 4).

Figure 4. The protein record for the Zika virus RNA polymerase with a link to run BLAST on the right.

On the right hand side of the protein record (Fig. 4) is a link to Run BLAST. Students select this link to access the protein blast search algorithm. Protein blast is an algorithm that searches databases of protein sequences to find similar proteins.

From the BLAST search form, we choose the Protein Data Bank database (Fig. 5). All the proteins in this database can be found in files of molecular structures.

Figure 5. The BLAST form.

Next, we enter Zika in the organism field and click Exclude so that we can avoid getting results from Zika virus proteins. It’s not necessary to know the taxid for this organism. This information is entered by autocomplete after we type “Zika” in the organism field (Fig. 6).

Figure 6. Excluding Zika virus from the BLAST search.

Now, we click BLAST to start the search.

Our BLAST results appear shortly. The top graph shows the polymerase domain in tan on the right side (Fig. 7). The graph below shows where related proteins align to the Zika protein sequence that we used as a query.

Now that we have a set of similar proteins, the next step is find proteins that are bound to potential drugs. We scroll down the page and read the titles to identify structures where a protein is bound to some kind of inhibitor or potential drug.

In Figure 8, we can see one structure where the RNA polymerase from Dengue virus is bound to compound 29 and another where it’s bound to compound 27. Both compounds 27 and 29 might be drug candidates. These are shown in the same alignment because the protein sequences are identical.

Figure 8. Protein sequences from molecular structure files showing that the protein may be bound to an inhibitor.

We can confirm that these compounds might be drug candidates or inhibitors by following the links to the structure record and skimming the title of the paper or abstract if there’s one accompanying the structure.

Both of these compounds are described in a paper discussing potent inhibitors of Dengue virus. The title and the abstract tell us the authors were looking for potential drugs.

As mentioned earlier, I wanted to provide students with an example to show how this process might work. Since Sofosbuvir has been approved by the FDA to treat Hepatitis C infections and the Hepatitis C virus belongs to the same class of viruses as Zika virus, I decided to compare the Zika virus RNA polymerase to the Hepatitis C RNA polymerase to see if the Zika virus protein might contain a likely binding site for Sofosbuvir.

I used blastp to compare these two proteins and found a region with a significant E value (Fig. 9). A an E value of 0.001 shows there is a 1 in 1000 chance that two proteins would match to this extent. Of the amino acids, 25% are identical and 43% are either identical or conserved.

Figure 9. A statistically significant alignment between a region of the Zika virus RNA polymerase query sequence and the RNA polymerase from Hepatitis C virus.

5. Use Molecule World™ to identify amino acids in the drug-binding site.

To look at the binding site, I downloaded 4WTG in Molecule World from the MMDB (Molecular Modeling Database). This structure contains the RNA polymerase, RNA in the process of being copied, and Sofosbuvir. Sofosbuvir is a nucleotide analogue that blocks RNA synthesis by preventing RNA polymerase from adding new residues to the chain after the drug has been incorporated. Looking at the structure shows me that Sofosbuvir has been incorporated into the RNA chain (Fig. 10).

To identify the amino acids in the binding site, I touch the Show sequence button to open the sequence viewer and touch the name of the drug to select it. Two manganese atoms appeared to be bound to the drug, so I selected those as well.

The next step is to touch the Selection button and choose Select nearby. This highlights everything within 5-6 angstroms of the drug. In general, anything within 5 angstroms of a substrate would be considered to be located in the binding site.

Next, I open the Show/Hide menu and choose “All atoms in residue” to see the amino acid side chains. Viewing the side chains shows me if any amino acid residues form bonds to the drug. I can also hide everything that’s not selected to make the interaction easier to see. Since some of the highlighted objects in my structure are RNA, I deselect those and hide them as well. Figure 11 shows some of the amino acids in the binding site for sofosbuvir with the amino acid sequence below.

Figure 11. Amino acids in the Sofosbuvir binding site are colored by residue in both the structure and the sequence.

6. Determine if amino acids in the drug-binding site are present in both proteins.

Once we know which amino acids bind to the drug and which are located in the binding site, we can add these data to the sequence alignment to see if these same amino acids are present in both proteins.

A common way to show the similarity between two sequences is to align them with one sequence on top and the related sequence below. Figure 12 shows this alignment, with the RNA polymerase sequence from Zika virus on top (the Query) and the Hepatitis C virus RNA polymerase is shown on the bottom. When the two sequences match, the amino acid is shown in the middle. If the amino acids are similar, but not identical, a + sign is shown. This helps us to visualize the similarity between two protein sequences.

We copy the alignment from the BLAST results and paste it into a Word file to make it easy to annotate. This often requires a bit of formatting work as well.

I also find it helpful to touch the Molecule icon and select the option to color the amino acids by residue. Then, I scroll through the sequence to find highlighted amino acids. When I find them, I highlight the corresponding amino acids in the aligned sequences.

In Figure 11, I can see that arginine 158 may form a salt bridge with oxygens in the nearby phosphate group. This amino acid is in both proteins (Fig. 12). Aspartic acid 225 is likely to form a hydrogen bond to the drug and Aspartic acid 220 forms a metal bond to one of the manganese atoms.

Figure 12. Aligned sequences from the Zika virus RNA polymerase (Query) and the Hepatitis C virus RNA polymerase. Amino acids in the Sofosbuvir binding site that are found in both proteins are highlighted.

Based on the number of shared amino acids in the binding site and the types of interactions between shared amino acids, and the drug, I predicted that sofosbuvir would inhibit the Zika virus RNA polymerase. This prediction was recently confirmed by in vivo experiments in cell culture and in mice (2, 3).

Conclusion

The method described in this article has been successfully used with students in three semesters. Students have enjoyed it and commented that they like doing things that relate to real life. This technique is suitable for student projects for a number of reasons. Students can can apply this technique to multiple kinds of drug targets and carry out the steps involved using publicly available data and low-cost, user-friendly software. In addition, this gives students an opportunity to learn and practice skills that are used in the biotech industry. Furthermore, this same method was used in the two papers discussed earlier and predictions made through this method were shown to be correct when experiments were performed with cell cultures and mice.

Comments

Thanks for describing your course activities to us. The class sounds like something my students would love to take. Is the course offered through the chemistry department or biology? What prerequisites does the course have? Since the course is offered online, I'm wondering whether you have participants who aren't normally students at Austin Community College.

Students outside of Austin Community College are welcome to take the course. They either take it for credit or take it as a continuing education course where they don't get credit but it costs less.

Ideally, a student should have had at least one chemistry and one biology course, but I've learned not to make too many assumptions about background knowledge. As a consequence, we start out slow and build up to the more challenging concepts that you see in this assignment. We begin with exploring and understanding databases, then move on to molecular structures (chemical bonds in biological systems, DNA structure, protein structure, etc.), then molecular sequence analysis (blastn, blastp, blastx, statistics, sequence viewers), and genetic variation (translation, ORF finder, mutations, affects on structures).

This semester I'm also teaching a course for people who are interested in teaching bioinformatics. We will be offering it next Spring semester as well. I'll add a URL later today with a page where you can sign up for more information.

I'm a little biased (since I teach biochemistry), but I find this sort of thing interesting. Have you thought about running this method with other diseases? There are classic drug design 'stories' for molecules like Imantinib (Gleevec) for cancer and dorzolamide (Trusopt) for glaucoma, which I think was the first marketed drug resulting from rational design methods, so there is a nice tie-in.

I haven't tried this method yet with other drugs or possible drug targets. However, when I saw the publications that came out with Sofosbuvir, and realized that the authors used the pretty much the same method I did, it did make me wonder if this a more standard method than I realized.

I see this class activity requires a broad range of background knowledge, including basics of BLAST, sequence analysis, 3D structure visualization, databases, and so on. So, I don't think you introduced students to this activity early in the semester. Rather, I guess you did it near the end of the semester. So, I would like to ask you about these two:

(1) When did students do this activity? (e.g., before/after the mid-term or x-th week of a 14-week semester, ...).
(2) How many lectures (or how many hours) were spent teaching background knowledge necessary for this activity, before doing this activity?

Students do this activity in the 7th week of class. Before this, they spend 2 weeks on databases, and 4 weeks on working with molecular structures to learn about the atoms in biochemicals, properties of amino acids and nucleotides, and chemical bonds. They also compare structures of drug resistant and drug sensitive viruses to see how bonds between key amino acid residues and an anti-viral drug change in the drug resistant form.

It's hard to say how many hours were involved. This is an on-line course with 3 credits so I estimate work based on 10 hours a week.

Your course content is quite remarkable. I used to teach organic chemistry and occassionally biochemistry. (As far as prequisites, I can remember when almost none of the biology majors in the class could recognize pyruvic acid) I retired from the classroom about 8 years ago and have been teaching online only. I am not far from you, living in Georgetown (keeping Austin weird). I can also appreciate the difficulties of dealing with student technical issues. There is a perception that students have great technology facility because of their use of iphones and social media. Practical applications are another matter. I remember learning use the protein database search back in 1991 and it was fairly simple although a bit slow but students always had fits with it. How do you deal with those issues?

Promoting the idea of drug design from informatics is very timely. A friend who worked at Merck told me that the existing methods were a major cost driver for the company. Animal testing was the worst part because less (generally much less) than 5% of the drugs which were developed through animal testing had any effectiveness in humans and some were very harmful, even deadly. I would have serious ethical concerns with that but anybody could understand the economic consequences of animal testing as a drug development paradigm. Do your students have any appreciation of this aspect of the drug design process?

Most of my students are enrolled in a biotechnology education program. I'm not sure how much they learn about animal testing, but they do know that it's a significant part of the drug approval process.

As far as the technology goes, the molecular modeling program we use (Molecule World) is pretty user friendly. I am somewhat biased since I helped develop it with an SBIR grant from the NSF in order to make user-friendly modeling tool for teaching. So, I think there's an easier learning curve for students than with programs like PyMol or some others. As far as other tech issues, we're using iBooks which are very helpful since we can include videos and games. We make sure that students learn basic computer techniques like capturing screen images, cropping and annotating images, and using key commands to find text on a web page, copy text, select text, paste text. The students get lots of practice since I introduce a technique and we use them over and over again. I also have my office hours via Google Hangouts or Skype.

That said, there are students who have a hard time figuring out how to navigate an on-line course. To compensate for the challenges of being in an on-line course and not seeing students in person, except via Google Hangouts, I have made quite a few videos. You can see them here:https://www.youtube.com/channel/UClJETZGbTQ6eUMcXxKXMBDQ

As far as databases go, I do have students explore the Nucleic Acids Research database issue, but mostly we try to do as much as we can with the NCBI. I think they're a little more user-friendly than the PDB and since they have 38 databases, plus many varieties of BLAST, it works pretty well.