Talk:CH391L/S13/DirectedProteinEvolution

From OpenWetWare

(New page: *'''~~~~''':It would be nice if you added literature examples of Directed Evolution of Proteins for each approach as done in the Ancestral Sequence Reconstruction or take a similar approac...)

*'''[[User:Alvaro E. Rodriguez M.|Alvaro E. Rodriguez M.]] 21:45, 21 February 2013 (EST)''':It would be nice if you added literature examples of Directed Evolution of Proteins for each approach as done in the Ancestral Sequence Reconstruction or take a similar approach.

*'''[[User:Alvaro E. Rodriguez M.|Alvaro E. Rodriguez M.]] 21:45, 21 February 2013 (EST)''':It would be nice if you added literature examples of Directed Evolution of Proteins for each approach as done in the Ancestral Sequence Reconstruction or take a similar approach.

+

**'''[[User:Max E. Rubinson|Max E. Rubinson]] 14:42, 16 April 2013 (EDT)''': Alvaro, I linked to either a Wikipedia article or a research paper describing each technique.

+

**'''[[User:Jeffrey E. Barrick|Jeffrey E. Barrick]] 15:23, 2 March 2013 (EST)''':Having that many citations is beyond the scope of what is expected in one of these topic pages. Read the review papers if you want to go that deep.

*'''[[User:Kevin Baldridge|Kevin Baldridge]] 16:44, 25 February 2013 (EST)''':it would be nice to have a one or two sentence summary of the other techniques you list in the advanced/high-throughput section

*'''[[User:Gabriel Wu|Gabriel Wu]] 16:55, 25 February 2013 (EST)''': For those of you who do this sort of thing, have you any thoughts on the contexts in which Directed evolution works well and, more importantly, where it doesn't work well? Since most papers are about successes, do you know of any examples (personal communication/experience) where it doesn't work and speculation on why?

+

*'''[[User:Evan J. Weaver|Evan Weaver]] 17:00, 26 February 2013 (EST)''': I remember you said what the genetics method was for high throughput and screening on monday, but what was it again? The list is very vague.

+

**'''[[User:Max E. Rubinson|Max E. Rubinson]] 14:53, 16 April 2013 (EDT)''': I think you are referring to genetic complementation methods of selection. These methods involve deleting the WT copy of a gene and transforming cells with a mutant library and looking for growth.

+

*'''[[User:Jeffrey E. Barrick|Jeffrey E. Barrick]] 15:33, 2 March 2013 (EST)''':It would be nice to put some numbers on the typical plasmid screening in ''E. coli'' experimental procedure. How many variants can be constructed (limited by transformation efficiency)? How many can be tested by screening in microplates or as colonies on agar plates? How many could be tested with a selection? Also, ''in vitro'' vs. ''in vivo'' conditions can be very important depending on what activity you are after. Often things selected to work ''in vitro'' don't work as well ''in vivo''.

**'''[[User:Neil R Gottel|Neil R Gottel]] 16:45, 28 February 2013 (EST)''': Different organisms will differ in the amount of each tRNA that corresponds to each codon. Certain codons are rare in some species, while common in others. So, if you're putting a jellyfish gene into E. coli, then the codon usage is likely not optimized. Then production of that gene's product will be slower/lower (because it takes longer to produce a peptide if the ribosome is waiting around for a rare tRNA to come by). However, according to this OWW page on [[Codon usage optimization]], and specifically [http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007002 this paper], the most important factor to consider is which tRNAs are charged (that is, get amino acids attached to them) when the cell is starving, and to favor using the corresponding codons when optimizing your gene. I haven't actually done this sort of optimization though, so hopefully someone else more experienced can chime in.

+

***'''[[User:Neil R Gottel|Neil R Gottel]] 17:23, 28 February 2013 (EST)''':So, when we make our awesome reconstructed ancestral sequence of [iGEM project], we may want to optimize it a bit before unleashing it on the world.

+

***'''[[User:Alvaro E. Rodriguez M.|Alvaro E. Rodriguez M.]] 19:58, 28 February 2013 (EST)''':Also would like to complement Neil's comment with the following [http://genesdev.cshlp.org/content/18/7/731.full paper] that describes what are the limits to the genetic code

+

***'''[[User:Benjamin Gilman|Benjamin Gilman]] 17:55, 28 February 2013 (EST)''': Most people choose codons corresponding to the most abundant tRNAs when optimizing a gene for E. coli because they want maximum yield, but there are circumstances where you might want to tune down the expression of a protein to more closely match its level in a native organism. The 2011 University of Dundee iGEM team actually wrote a software tool (which I couldn't seem to download) that takes sequences from one organism and converts the codons to ones with the most similar frequency in a new organism. [http://2011.igem.org/Team:Dundee/Software The Gene Synthesiser]

+

***'''[[User:Alvaro E. Rodriguez M.|Alvaro E. Rodriguez M.]] 19:58, 28 February 2013 (EST)''':Check their web page and most of the tools that have been developed aren't innovative, as many companies such as IDT say they'll codon optimize specific genes and even databases exist for codon optimization in several organisms, for example [http://www.geneinfinity.org/sp/sp_codonusage.html#databases this webpage] lists several.

+

***'''[[User:Gabriel Wu|Gabriel Wu]] 21:34, 28 February 2013 (EST)''': For overexpression in E. coli, there are special strains which contain a plasmid (commonly the RIL plasmid) that encodes for extra copies of frequently limiting tRNA genes. Read about it [http://www.genomics.agilent.com/CollectionSubpage.aspx?PageType=Product&SubPageType=ProductData&PageID=484 here].

+

***'''[[User:Jeffrey E. Barrick|Jeffrey E. Barrick]] 15:21, 2 March 2013 (EST)''':My impression is that you want to not only choose from among the best codons for each amino acid, but that you also need to avoid making stable mRNA secondary structures that certain combinations of codons could cause if you want maximal expression. This is one case where solely optimizing one sequence parameter may actually cause problems.

+

****'''[[User:Gabriel Wu|Gabriel Wu]] 14:03, 4 March 2013 (EST)''': This is an interesting point. So secondary structure particularly at the beginning of the mRNA molecule is known to be particularly important in protein expression levels (at least in E. coli). Howard Salis at Penn State University created an [http://www.salis.psu.edu/RBS_Calculator.shtml RBS calculator] that tries to model and predict the effects of this secondary structure and ultimately gives a numerical output (relative scale) to guide construction of synthetic biology constructs and control the relative expression of proteins in E. coli.

+

***'''[[User:Gabriel Wu|Gabriel Wu]] 14:06, 4 March 2013 (EST)''': The topic of codon optimization is also related to the idea of ribosomal pausing. Jonathan Weissman has been using a technique known as ribosomal profiling to investigate ribosome occupancy along an mRNA molecule. It's pretty interesting stuff and might be worth reading up on [http://www.nature.com/nature/journal/v484/n7395/full/nature10965.html].

+

+

== Comparison to ASR ==

+

*'''[[User:Siddharth Das|Siddharth Das]] 00:37, 1 March 2013 (EST)''': What is the difference between ancestral reconstruction and directed evolution? Both seem to rely on a substantial degree of randomness. Does one particular method prove to be advantageous over the other method? If so, can either methods bring about a desired trait such as binding to specific substrate?

+

**'''[[User:Jeffrey E. Barrick|Jeffrey E. Barrick]] 15:21, 2 March 2013 (EST)''':I'd say that they both "use" evolution: one is prospective and one is retrospective. Generally we can't get as deep an evolutionary history on a laboratory timescale, so directed evolution alters a handful (<10 amino acids total). As we saw, reconstructing putative ancient proteins can result in >100 changes from any of the contemporary proteins..

+

+

==Fitness Landscapes==

+

*'''[[User:Andre C Maranhao|Andre C Maranhao]] 04:13, 27 February 2013 (EST)''':I thought it'd be nice to have a section explaining fitness landscapes and moving through sequence space. Here are some good papers [http://www.nature.com/nrm/journal/v10/n12/abs/nrm2805.html] [http://www.sciencedirect.com/science/article/pii/S1367593109000076]

+

**'''[[User:Max E. Rubinson|Max E. Rubinson]] 15:01, 16 April 2013 (EDT)''': While it is a very important topic, I think including a section explaining fitness landscapes is beyond the scope of this topic/assignment. It likely deserves it's own page.

+

**'''[[User: Marco D Howard | Marco D Howard]] 01 March 2013 (CST)''': I took a look at the paper from the nature reviews. I found the discussion very interesting, but I have one worry with their search space methodology. It seems that the search strategy is to always follow the least steep path way through the search space. I would argue that there if you always do this, you may confine yourself to certain areas of the space, and be unable to escape. Perhaps a good strategy would be to define some criteria where you accept unfavorable moves a certain percentage of the time. That way you move towards you desired goal most of the time, but you don't eliminate the chance of finding a much more optimal solution.

+

***'''[[User:Jeffrey E. Barrick|Jeffrey E. Barrick]] 15:13, 2 March 2013 (EST)''':To some extent generating multiple mutations at the same time by mutagenic PCR or other mutagenesis methods can let you get out of local optima. Perhaps more importantly, DNA shuffling (sometimes called sexual PCR) can mix together two very different solutions that differ in the amino acids they have at many positions, and thus "jump" through sequence space.

+

***'''[[User:Gabriel Wu|Gabriel Wu]] 13:52, 4 March 2013 (EST)''': I think it's important to realize that fitness landscapes are mostly abstractions meant to help us visualize what's going on evolutionarily. In reality, we don't actually know where in the landscape we start, but we generate libraries of enough diversity (related to an earlier point made about quantifying library diversity) to try and samples as much of the fitness landscape as possible. In these experiments, we are actually giving ourselves multiple opportunities to find optimum paths. It's true that many times the selection will result in being trapped in a local optima, but other times the selection will result in positioning in other places on the fitness landscape. To the point of limiting our search space and being trapped in local optima, the authors in the review make the point that "jumping" across the fitness landscape the way you suggest typically results in deleterious effects. But, if you really want to sample other space, they suggest unnatural amino acid incorporation, computational design with structure information, and recombination methods.

Current revision

Contents

Techniques

Alvaro E. Rodriguez M. 21:45, 21 February 2013 (EST):It would be nice if you added literature examples of Directed Evolution of Proteins for each approach as done in the Ancestral Sequence Reconstruction or take a similar approach.

Max E. Rubinson 14:42, 16 April 2013 (EDT): Alvaro, I linked to either a Wikipedia article or a research paper describing each technique.

Jeffrey E. Barrick 15:23, 2 March 2013 (EST):Having that many citations is beyond the scope of what is expected in one of these topic pages. Read the review papers if you want to go that deep.

Critical parameters for success

Gabriel Wu 16:55, 25 February 2013 (EST): For those of you who do this sort of thing, have you any thoughts on the contexts in which Directed evolution works well and, more importantly, where it doesn't work well? Since most papers are about successes, do you know of any examples (personal communication/experience) where it doesn't work and speculation on why?

Evan Weaver 17:00, 26 February 2013 (EST): I remember you said what the genetics method was for high throughput and screening on monday, but what was it again? The list is very vague.

Max E. Rubinson 14:53, 16 April 2013 (EDT): I think you are referring to genetic complementation methods of selection. These methods involve deleting the WT copy of a gene and transforming cells with a mutant library and looking for growth.

Jeffrey E. Barrick 15:33, 2 March 2013 (EST):It would be nice to put some numbers on the typical plasmid screening in E. coli experimental procedure. How many variants can be constructed (limited by transformation efficiency)? How many can be tested by screening in microplates or as colonies on agar plates? How many could be tested with a selection? Also, in vitro vs. in vivo conditions can be very important depending on what activity you are after. Often things selected to work in vitro don't work as well in vivo.

Codon usage

Evan Weaver 17:03, 26 February 2013 (EST): What does improved codon usage mean in the context of synthetic GFP?

Neil R Gottel 16:45, 28 February 2013 (EST): Different organisms will differ in the amount of each tRNA that corresponds to each codon. Certain codons are rare in some species, while common in others. So, if you're putting a jellyfish gene into E. coli, then the codon usage is likely not optimized. Then production of that gene's product will be slower/lower (because it takes longer to produce a peptide if the ribosome is waiting around for a rare tRNA to come by). However, according to this OWW page on Codon usage optimization, and specifically this paper, the most important factor to consider is which tRNAs are charged (that is, get amino acids attached to them) when the cell is starving, and to favor using the corresponding codons when optimizing your gene. I haven't actually done this sort of optimization though, so hopefully someone else more experienced can chime in.

Neil R Gottel 17:23, 28 February 2013 (EST):So, when we make our awesome reconstructed ancestral sequence of [iGEM project], we may want to optimize it a bit before unleashing it on the world.

Alvaro E. Rodriguez M. 19:58, 28 February 2013 (EST):Also would like to complement Neil's comment with the following paper that describes what are the limits to the genetic code

Benjamin Gilman 17:55, 28 February 2013 (EST): Most people choose codons corresponding to the most abundant tRNAs when optimizing a gene for E. coli because they want maximum yield, but there are circumstances where you might want to tune down the expression of a protein to more closely match its level in a native organism. The 2011 University of Dundee iGEM team actually wrote a software tool (which I couldn't seem to download) that takes sequences from one organism and converts the codons to ones with the most similar frequency in a new organism. The Gene Synthesiser

Alvaro E. Rodriguez M. 19:58, 28 February 2013 (EST):Check their web page and most of the tools that have been developed aren't innovative, as many companies such as IDT say they'll codon optimize specific genes and even databases exist for codon optimization in several organisms, for example this webpage lists several.

Gabriel Wu 21:34, 28 February 2013 (EST): For overexpression in E. coli, there are special strains which contain a plasmid (commonly the RIL plasmid) that encodes for extra copies of frequently limiting tRNA genes. Read about it here.

Jeffrey E. Barrick 15:21, 2 March 2013 (EST):My impression is that you want to not only choose from among the best codons for each amino acid, but that you also need to avoid making stable mRNA secondary structures that certain combinations of codons could cause if you want maximal expression. This is one case where solely optimizing one sequence parameter may actually cause problems.

Gabriel Wu 14:03, 4 March 2013 (EST): This is an interesting point. So secondary structure particularly at the beginning of the mRNA molecule is known to be particularly important in protein expression levels (at least in E. coli). Howard Salis at Penn State University created an RBS calculator that tries to model and predict the effects of this secondary structure and ultimately gives a numerical output (relative scale) to guide construction of synthetic biology constructs and control the relative expression of proteins in E. coli.

Gabriel Wu 14:06, 4 March 2013 (EST): The topic of codon optimization is also related to the idea of ribosomal pausing. Jonathan Weissman has been using a technique known as ribosomal profiling to investigate ribosome occupancy along an mRNA molecule. It's pretty interesting stuff and might be worth reading up on [1].

Comparison to ASR

Siddharth Das 00:37, 1 March 2013 (EST): What is the difference between ancestral reconstruction and directed evolution? Both seem to rely on a substantial degree of randomness. Does one particular method prove to be advantageous over the other method? If so, can either methods bring about a desired trait such as binding to specific substrate?

Jeffrey E. Barrick 15:21, 2 March 2013 (EST):I'd say that they both "use" evolution: one is prospective and one is retrospective. Generally we can't get as deep an evolutionary history on a laboratory timescale, so directed evolution alters a handful (<10 amino acids total). As we saw, reconstructing putative ancient proteins can result in >100 changes from any of the contemporary proteins..

Fitness Landscapes

Andre C Maranhao 04:13, 27 February 2013 (EST):I thought it'd be nice to have a section explaining fitness landscapes and moving through sequence space. Here are some good papers [2][3]

Max E. Rubinson 15:01, 16 April 2013 (EDT): While it is a very important topic, I think including a section explaining fitness landscapes is beyond the scope of this topic/assignment. It likely deserves it's own page.

Marco D Howard 01 March 2013 (CST): I took a look at the paper from the nature reviews. I found the discussion very interesting, but I have one worry with their search space methodology. It seems that the search strategy is to always follow the least steep path way through the search space. I would argue that there if you always do this, you may confine yourself to certain areas of the space, and be unable to escape. Perhaps a good strategy would be to define some criteria where you accept unfavorable moves a certain percentage of the time. That way you move towards you desired goal most of the time, but you don't eliminate the chance of finding a much more optimal solution.

Jeffrey E. Barrick 15:13, 2 March 2013 (EST):To some extent generating multiple mutations at the same time by mutagenic PCR or other mutagenesis methods can let you get out of local optima. Perhaps more importantly, DNA shuffling (sometimes called sexual PCR) can mix together two very different solutions that differ in the amino acids they have at many positions, and thus "jump" through sequence space.

Gabriel Wu 13:52, 4 March 2013 (EST): I think it's important to realize that fitness landscapes are mostly abstractions meant to help us visualize what's going on evolutionarily. In reality, we don't actually know where in the landscape we start, but we generate libraries of enough diversity (related to an earlier point made about quantifying library diversity) to try and samples as much of the fitness landscape as possible. In these experiments, we are actually giving ourselves multiple opportunities to find optimum paths. It's true that many times the selection will result in being trapped in a local optima, but other times the selection will result in positioning in other places on the fitness landscape. To the point of limiting our search space and being trapped in local optima, the authors in the review make the point that "jumping" across the fitness landscape the way you suggest typically results in deleterious effects. But, if you really want to sample other space, they suggest unnatural amino acid incorporation, computational design with structure information, and recombination methods.