Category Archives: Theological implications of simple chmistry

With apologies to Numbers 24:5, “How goodly are thy tents, Oh Jacob” — a recent paper shows how shockingly error ridden our genomes actually are [ Science vol. 337 pp. 64 – 69 ’12 ]. The authors sequenced roughly three quarters of the genes coding for proteins in some 2,439 people — e.g. 15,585 protein coding genes. This left 98% of the genome untouched, primarily because we really don’t know what it does or how it does it, despite the fact that it controls, when, where and how much of each protein is made. So they basically looked at the bricks from which we are built (the proteins) and not the plans (the 98%).

The news is not very good. The subjects came from two groups: 1,351 Europeans and 1,088 Africans (the latter, because genetic diversity is far higher among Africans as that’s where humanity arose, and where mutations have had the longest time to accumulate).

The news is not very good. First, some background.

Recall that each nucleotide is one of four possibilities (A, T, G, C), and that each 3 nucleotides therefore has 4^3 = 64 possibilities. 61/64 combinations code for amino acids which, since we have only 20 gives a certain redundancy of the famed genetic code. The other 3 combinations code for no amino acid (usually) and tell the machinery making proteins to stop. Although crucial to our existence, these are called nonsense codons.

The genetic code is therefore 3fold degenerate (on average). However, some amino acids are coded for by just 1 combination of 3 nucleotides while others are coded by as many as 6. So some single nucleotide variants (SNVs) leave the amino acid coded for the same (these are the synonymous SNVs), while others change the amino acid (nonSynonymous SNVs), and possibly protein function.

Ask some one with sickle cell anemia how much trouble just one nonSynonymous SNV can cause — it’s only 1 amino acid out of 147. Even worse, ask someone with cystic fibrosis where just one of 1,480 amino acids is missing.

Here’s the bad news. In the population as a whole, they found 500,000 single nucleotide variants (SNVs). If you’re still not sure what is meant by this, the 5 articles in https://luysii.wordpress.com/category/molecular-biology-survival-guide/ should be all the background you need.

More than 400,000 of the variants were previously unknown. Also more than 400,000 of them were found either in Africans or Europeans but not both. If you divide 500,000 by 2,439 you get 205 variants per person. However, SNVs are far more common than that, and each individual contains an average of 14,000.

Well, how many of the 500,000 or so CNVs they found are nonSynonymous? One would think about 1/3 statistically. However, They found more than half 292,125/500,000 — nearly 60% — were nonSynonymous.

It get’s worse: 6,165 of the nonSynonymous variants are nonSense codons. This means that the protein coded for by such a gene, terminates prematurely, meaning that it can terminate anywhere. On average one would expect that half of these nonsense codons result in a protein of less than half the normal length. This would very likely obliterate whatever function the protein had.

Obviously, they couldn’t test all 500,000 SNVs to see how they affected protein function (and we really only have a decent idea of what half our 20,000 or so proteins are doing). They had to guess. They came up with a figure of 2 – 4% of the 14,000 SNVs being functionally significant — That’s 280 – 560 significant mutations per individual.

Clearly, despite the horrible examples of cystic fibrosis and sickle cell anemia above, most of these can’t be doing very much, because these were normal people being studied.

There are all sorts of implications of this work. One is the subject of a future post — how hard this diversity makes drug discovery. Another reiterates the Tolstoy theme mentioned earlier about the genetic defects causing schizophrenia and autism — ““Happy families are all alike; every unhappy family is unhappy in its own way”. Thus beginneth Anna Karenina.

A third is that this shows that the 1000 fold expansion of the human population has pretty much obviated much natural selection eliminating these variants. I’ll leave it to the geneticists to figure out what this means for the eventual survival of the species, as these mutants continue to accumulate.

The paper is fascinating, and sure to change our conception of what a ‘normal’ genome actually is. Nonetheless, all they did was follow Yogi Berra’s dictum — “You can observe a lot by watching.” It certainly wasn’t creative or ingenious in any sense. Sometimes grunt work like this wins the day. I’ll leave this to Ashutosh to write about its philosophical implications for research.

Statements like this are all over the Old and New Testament (try Psalms 37, or Matthew 5:5). Mice are a meek lot, but some are meeker than others. How do you tell? You perform the tube test — put two mice at each end of a tube and see which one backs out. If you test groups of 4 mice this way, you find a linear, transitive hierarchy of social dominance 95% of the time. For details see [ Science vol. 334 pp. 608 – 609, 693 and 697 ]. The tube test correlated well with several other tests of social dominance. Transitivity just means that mouse A dominant over mouse B and mouse B dominant over mouse C implies that mouse A is dominant over mouse C. Social dominance is independent of weight, motor skill etc. etc.

So they looked at the way nerve cells in the cerebral cortex of the mice communicate with each other. They studied a type of nerve cell (pyramidal neuron) in the fifth layer in of the medial prefrontal cerebral cortex (which has 6 layers). The pyramidal neurons found in layer V communicate with other parts of the brain. The dominant mice had more effective communication between these cells than the nondominant ones.

Communication between neurons is accomplished by a specialized structure called a synapse where the processes of two nerve cells come into extremely close contact (200 Angstroms) with each other.. Communication between neuron1 (presynaptic) and neuron2 (postsynaptic) is largely one way. Different small molecules (neurotransmitters) are released by presynaptic neurons and bind to proteins in the postsynaptic neuron, which alter their conformation on binding and cause cause the neuron to fire. If this sounds like a drug receptor interaction, well it is. The drug is glutamic acid, a neurotransmitter, and the receptor protein is something called AMPAR (you don’t want to know what the acronym stands for).

So how did the scientists cause the meek to become strong, and the strong to become meek? They injected viruses into the cortex of the animals. The viruses either contained more AMPARs, or a defective AMPAR which bound glutamic acid but didn’t do anything. The viruses were able to get their protein load into the layer five pyramidal neurons. The meek mice getting more AMPARs (so there was more response in the postsynaptic neuron to glutamic acid) became dominant. The dominant mice receiving the defective AMPARs became meek.

Now you know how one biblical prophecy will come to pass. Ain’t science grand?

Every budding chemist sits through a statistical mechanics course, in which the insanity and inutility of knowing the position and velocity of each and every of the 10^23 molecules of a mole or so of gas in a container is brought home. Instead we need to know the average energy of the molecules and the volume they are confined in, to get the pressure and the temperature.

However, people are taking the first approach in an attempt to understand the brain. They want a ‘wiring diagram’ of the brain. e. g. a list of every neuron and for each neuron a list of the other neurons connected to it, and a third list for each neuron of the neurons it is connected to. For the non-neuroscientist — the connections are called synapses, and they essentially communicate in one direction only (true to a first approximation but no further as there is strong evidence that communication goes both ways, with one of the ‘other way’ transmitters being endogenous marihuana). This is why you need the second and third lists.

Clearly a monumental undertaking and one which grows more monumental with the passage of time. Starting out in the 60s, it was estimated that we had about a billion neurons (no one could possibly count each of them). This is where the neurological urban myth of the loss of 10,000 neurons each day came from. For details see https://luysii.wordpress.com/2011/03/13/neurological-urban-legends/.

The latest estimate [ Science vol. 331 p. 708 ’11 ] is that we have 80 billion neurons connected to each other by 150 trillion synapses. Well, that’s not a mole of synapses but it is a nanoMole of them. People are nonetheless trying to see which areas of the brain are connected to each other to at least get a schematic diagram.

Even if you had the complete wiring diagram, nobody’s brain is strong enough to comprehend it. I strongly recommend looking at the pictures found in Nature vol. 471 pp. 177 – 182 ’11 to get a sense of the complexity of the interconnection between neurons and just how many there are. Figure 2 (p. 179) is particularly revealing showing a 3 dimensional reconstruction using the high resolutions obtainable by the electron microscope. Stare at figure 2.f. a while and try to figure out what’s going on. It’s both amazing and humbling.

But even assuming that someone or something could, you still wouldn’t have enough information to figure out how the brain is doing what it clearly is doing. There are at least 3 reasons.

l. Synapses, to a first approximation, are excitatory (turn on the neuron to which they are attached, making it fire an impulse) or inhibitory (preventing the neuron to which they are attached from firing in response to impulses from other synapses). A wiring diagram alone won’t tell you this.

2. When I was starting out, the following statement would have seemed impossible. It is now possible to watch synapses in the living brain of awake animal for extended periods of time. But we now know that synapses come and go in the brain. The various papers don’t all agree on just what fraction of synapses last more than a few months, but it’s early times. Here are a few references [ Neuron vol. 69 pp. 1039 – 1041 ’11, ibid vol. 49 pp. 780 – 783, 877 – 887 ’06 ]. So the wiring diagram would have to be updated constantly.

3. Not all communication between neurons occurs at synapses. Certain neurotransmitters are generally released into the higher brain elements (cerebral cortex) where they bathe neurons and affecting their activity without any synapses for them (it’s called volume neurotransmission) Their importance in psychiatry and drug addiction is unparalleled. Examples of such volume transmitters include serotonin, dopamine and norepinephrine. Drugs of abuse affecting their action include cocaine, amphetamine. Drugs treating psychiatric disease affecting them include the antipsychotics, the antidepressants and probably the antimanics.

Statistical mechanics works because one molecule is pretty much like another. This certainly isn’t true for neurons. Have a look at http://faculties.sbu.ac.ir/~rajabi/Histo-labo-photos_files/kora-b-p-03-l.jpg. This is of the cerebral cortex — neurons are fairly creepy looking things, and no two shown are carbon copies.

The mere existence of 80 billion neurons and their 150 trillion connections (if the numbers are in fact correct) poses a series of puzzles. There is simply no way that the 3.2 billion nucleotides of out genome can code for each and every neuron, each and every synapse. The construction of the brain from the fertilized egg must be in some sense statistical. Remarkable that it happens at all. Embryologists are intensively working on how this happens — thousands of papers on the subject appear each year.

As my brain slowly recovers (or at least gets used to) from the chemical assault on it by inhaled corticosteroids and muscarinic anticholinergic drugs, I’m having a lot of fun reading a book by Melanie Mitchell “Complexity: A Guided Tour” — but her view of neurons is simplistic in the extreme — hopefully that will improve in the last 100 pages. A book review will follow. The whole book is quite relevant to the question — just what would you accept as an explanation of how the brain does what it does? The question leads into some deep philosophic minefields, but they can’t be avoided. That’s for another time.

Hopefully, I’ll be able to get back to Anslyn and Dougherty in the coming week (after taxes).

Anslyn && Dougherty is even more fun than Clayden et. al. It’s far more advanced, and I’m certainly glad I read Clayden first. On p. 24 they talk about the polarizability of molecules, sonething distinct from the dipole moment of the molecule. Polarizability is the ability of the molecule’s electron distribution to distort in the presence of an electric field. I was suprised to find that the usual suspects (e.g. water) aren’t that polarizable and that the champs are hydrocarbons. They don’t say how polarizability is measured, but I’ll take them at face value.

We wouldn’t exist without the membranes enclosing our cells which are largely hydrocarbon. Chemists know that fatty acids have one end (the carboxyl group) which dissolves in water while the rest is pure hydrocarbon. The classic is stearic acid — 18 carbons in a straight chain with a carboxyl group at one end. 3 molecules of stearic acid are esterified to glycerol in beef tallow (forming a triglyceride). The pioneers hydrolyzed it to make soap. Saturated fatty acids of 18 carbons or more are solid at body temperature (soap certainly is), but cellular membranes are fairly fluid, and proteins embedded in them move around pretty quickly. Why? Because most fatty acids found in biologic membranes over 16 carbons have double bonds in them. Guess whether they are cis or trans. Hint: the isomer used packs less well into crystals — you’ve got it, all the double bonds found in oleic (18 carbons 1 double bond), arachidonic (20 carbons, 4 double bonds) are trans – this keeps membranes fluids as well. No, they are cis — thanks to PostDoc for pointing this out. The cis double bond essentially puts a 60 degree kink in the hydrocarbon chain, making it much more difficult to pack in a liquid crystal type structure with all the hydrocarbon chains stretched out. Then there’s cholesterol which makes up 1/5 or so of membranes by weight — it also breaks up the tendency of fatty acid hydrocarbon chains to align with each other because it doesn’t pack with them very well. So cholesterol is another fluidizer of membranes.

How thick is the cellular membrane? If you figure the hydrocarbon chains of a saturated fatty acid stretched out as far as they can go, you get 1.54 Angstroms * cosine (30 degrees) = 1.33 Angstroms/carbon — times 16 = 21 Angstroms. Now double that because cellular membranes are lipid bilayers meaning that they are made of two layers of hydrocarbons facing each other, with the hydrophilic ends (carboxyls, phosphate groups) pointing outward. So we’re up to 42 Angstroms of thickness for the hydrocarbon part of the membrane. Add another 10 Angstroms or so for the hydrophilic ends (which include things like serine, choline etc. etc.) and you’re up to about 60 Angstroms thickness for the membrane (which is usually cited as 70 Angstroms — I don’t know why).

Neurologists and neurophysiologists spent a lot of time thinking about membranes, particularly those of neurons. In all these years, I’ve never hear anyone talk about hydrocarbon polarizability. It ought to be a huge factor in membrane function. Why? Because of the enormous electric field across the membranes enclosing all our cells (not just our neurons). The potential across the membranes is usually given as 70 milliVolts (inside negatively charged, outside positively charged). Why is this a big deal?

Because the electric field across our membranes is huge. 70 x 10^-3 volts is 70 milliVolts. 70 Angstroms is 7 nanoMeters (7 x 10^-9) meters. Divide 7 x 10^-3 volts by 7 x 10^-9 and you get a field of 10,000,000 Volts/meter. If hydrocarbons are ever going to polarize they should in this environment. The college physics book I bought for the Quantum Mechanics course a while ago — “Physics for Scientists and Engineers” 4th edition p. 662 talks about lightning. The potential difference leading to the discharge is the same; 10,000,000 Volts. This results in a much smaller electric field (probably by a factor of 1,000) because clouds aren’t 1 meter off the ground.

So why don’t our cells collapse and we die? I don’t know.

Here are a few Physics 102 questions for the cognoscenti out there.

l. Potential difference is due to charge separation. Assume a flat membrane 1 micron square and 70 Angstroms thick. How much charge must be separated to account for a potential of 70 milliVolts. Answer in number of charges rather than Coulombs.

2. Now let’s get real. We’re talking about neuronal processes here. So lets talk about a cylindrical membrane 1 micron long (remember that some neuronal processes — such as those going from your spinal cord to your big toe are a million times longer than this). Diameters of our nerve fiber range from 1 micron to 25 microns. Ignoring the complication of the myelin sheath, how much charge must be separated to produce a potential across the membrane of the neuronal process of 70 milliVolts.

Fourth: The gals in the steno pool. Over and over they type out parts of the business plan when needed. The plan itself is immense. War and Peace (English translation) has over half a million words and 3,100,000 characters. The business plan comes in a weird language with only 4 characters. But group two of them together and you have 16 possibilities, group 3 and you have 64. The plan itself contains 3.2 billion of these weird characters, or well over 1000 copies of Tolstoy’s epic.

Strangely, for a long time soi-disant experts thought most of the plan was junk. It was strangely repetitive and because it didn’t code for the division’s buildings it was dismissed. Now we know that some parts of the plan not coding for buildings tell us where to put them, when to make them and how many to make. We now know that the girls are transcribing at least half of the plan, and perhaps most of it. The experts for a time thought that this was like the turnings from a lathe, intellectual chaff if you will, but now they’re not quite so smug.

Fifth: Manufacturing — row on row of factories turning out (prefab) buildings. So much so that from the air (which, 100 years ago was the only way to see our division) it was though to be unique to our division class (it was called Nissl substance). There are a few factories in the far reaches of the division, but most factories are right here in the center with me. One of shipping’s big jobs is to get the buildings where they’re supposed to go. All this manufacturing and shipping consumes a lot of energy, so much so that even though our set of divisions constitutes just 3% or less of the organization we consume 20% of the energy of the entire enterprise.

Sixth: Communications — this is both a curse an a blessing. Our division receives about 1,000 incoming lines from other divisions and they never shut up. Not only that but they call as often as once every thousandth of a second. Some of this is handled right at the incoming line, but guess who has to absorb all this information and decide whether or not to send it on. The decision has been described by some as a computation, but it is far from straight forward.

The outgoing communications don’t use shipping (far too slow). Special buildings all over the periphery of the organization exist to send things out so that information can go down the 1 meter of so in around 1/100th of a second. If we were using the trucker analogy of going 90,000 miles instead of a meter, this would be 50 times the speed of light.

Sometimes we have to really step up the pace of our messages. Pity the poor divisions connected to the cervical spinal cord where commands to move the fingers are received. The boss has been practicing his piano like a banshee, and is now able to play 10 notes of a C scale in a second. That’s one message every 100 milliSeconds. He complains if they arrive unevenly.

Even as busy as the division is, I occasionally wonder about the organization as a whole (the job of the CEO is to think about the larger picture). I wonder how many divisions there are, and what or who organized us. Amazingly, no one knows just how many divisions of our type there actually are. Estimates years ago were in the millions. Now they’re in the billions. No one has ever actually counted us, just estimates are all we have. Hell of a way to run an organization. Who decides which incoming lines hit our division. I’m not sure how the division figures out who to send our messages to. It doesn’t seem conscious.

I’m pretty sure that the business plan can’t specify this sort of stuff. With only 3.2 billion characters each of which is one of 4 possibilities, this isn’t enough to individually address each of the billions of divisions of the organization. How did our division every find the division which controls the gastronemius and soleus anyway. Rumor has it that the entire organization with its billions of divisions arose from just one division like me. Sort of the big bang of business. Apparently this happens again and again. Very hard for this CEO to believe that it all arises by chance. I’ve been told that I lack sufficient faith that this is so.

Well anyway, our division has done a great job in the past year and we look forward to the next. I did hear that the boss is thinking of learning to play the organ. Heaven help us.

Even though I’m the CEO of a tiny department of a very large organization, it’s time to thank those unsung divisions that make it all possible. It’s been a very good year. Thanks in part to our work, the boss is a lot more adept at using the pedal when he plays the piano.

First: thanks to the guys in shipping and receiving. Kinesin moves the stuff out and Dynein brings it back home. Think of how far they have to go. The head office sits in area 4 of the cerebral cortex and K & D have to travel about 3 feet down to the motorneurons in the first sacral segment of the spinal cord controlling the gastrocnemius and soleus, so the boss can press the pedal on his piano when he wants. Like all good truckers, they travel on the highway. But instead of rolling they jump. The highway is pretty lumpy being made of 13 rows of tubulin dimers.

Now chemists are very detail oriented and think in terms of Angstroms (10^-10 meters) about the size of a hydrogen atom. As CEO and typical of cell biologists, I have to think in terms of the big picture, so I think in terms of nanoMeters (10^-9 meters). Each tubulin dimer is 80 nanoMeters long, and K & D essentially jump from one to the other in 80 nanoMeter steps. Now the boss is shrinking as he gets older, but my brothers working for players in the NBA have to go more than a meter to contract the gastrocnemius and soleus (among other muscles) to help their bosses jump. So split the distance and call the distance they have to go one Meter. How many jumps do Kinesin and Dynein have to make to get there? Just 10^9/80 — call it 10,000,000. The boys also have to jump from one microtubule to another, as the longest microtubule in our division is at most 100 microns (.1 milliMeter). So even in the best of cases they have to make at least 10,000 transfers between microtubules. It’s a miracle they get the job done at all.

To put this in perspective, consider a tractor trailer (not a truck — the part with the motor is the tractor, and the part pulled is the trailer — the distinction can be important, just like the difference between rifle and gun as anyone who’s been through basic training knows quite well). Say the trailer is 48 feet long, and let that be comparable to the 80 nanoMeters K and D have to jump. That’s 10,000,000 jumps of 48 feet or 90,909 miles. It’s amazing they get the job done.

Second: Thanks to probably the smallest member of the team. The electron. Its brain has to be tiny, yet it has mastered quantum mechanics because it knows how to tunnel through a potential barrier. In order to produce the fuel for K and D it has to tunnel some 20 Angstroms from the di-copper center (CuA) to heme a in cytochrome C oxidase (COX). Is the electron conscious? Who knows? I don’t tell it what to do. Now COX is just a part of one of our larger divisions, the power plant (the mitochondrion).

Third: The power plant. Amazing to think that it was once (a billion years or more ago) a free living bacterium. Somehow back in the mists of time one of our predecessors captured it. The power plant produces gas (ATP) for the motors to work. It’s really rather remarkable when you think of it. Instead of carrying a tank of ATP, kinesin and dynein literally swim in the stuff, picking it up from the surroundings as they move down the microtubule. Amazingly the entire division doesn’t burn up, but just uses the ATP when and where needed. No spontaneous combustion.

There are some other unsung divisions to talk about (I haven’t forgotten you ladies in the steno pool, and your incredible accuracy — 1 mistake per 100,000,000 letters [ Science vol. 328 pp. 636 – 639 ’10 ]). But that’s for next time.

To think that our organization arose by chance, working by finding a slightly better solution to problems it face boggles this CEO’s mind (but that’s the current faith — so good to see such faith in an increasingly secular world).

It doesn’t take much energy to denature a protein. About .4 kiloJoules/amino acid, so that a protein of 100 loses its function (denatures) with an energy input of 40 kiloJoules/Mole or about the energy required to break two measly hydrogen bonds [ Voet and Voet Biochemistry Ed. 3 p. 258 ]. Covalent bonds are a lot stronger, with carbon carbon single bonds and C – H bonds ten times stronger. All you have to do to denature chymotrypsin is pull apart its catalytic triad of histidine at position #57, aspartic acid at #102 and serine at #195. Clearly to get these 3 amino acids together the protein backbone has to turn and twist in space. Separating them doesn’t take much energy.

Amazingly, denature many of them and they spontaneously reform the active structure. Certainly the first such protein studied this way (ribonuclease by Anfinsen) did just that, leading to the idea that the 3 dimensional structure of a protein was determined by linear sequence of its amino acids along the backbone.

Over the decades crystal structure of protein after protein was solved by Xray crystallography, and everyone came to think of proteins as having ‘a’ structure. It was quickly found that there are parts in many proteins that won’t sit still even for crystallography, and it is now estimated [ Proc. Natl. Acad. Sci. vol. 103 pp. 12353 – 12358 ’06 ] that 30% of all proteins have stretches of over 30 amino acids that are intrinsically disordered.

Now sight your eye at the alpha carbon of one of the amino acids of a protein, looking toward the carbonyl carbon. There are 3 conformational energy minima the carbonyl can adopt. That’s potentially 3’^99 = 10^48 conformations. This is clearly an overestimate because of self intersection, but still quite large. Yet to be crystallizable the protein must choose just one of them and it must be lower in energy by 2 hydrogen bonds than all the rest.

Now think like a chemist and think about the side chains of the amino acids. The hydrocarbon types (alanine, glycine, valine, leucine, isoleucine and perhaps methionine) can dissolve in each other. Hydrogen bonding is possible between the serine and threonine and any carbonyl on the side chain or any of the amines. Salt bridges are possible between the two acids and 3 of the bases. The list goes on and on. Yet somehow the 195+ amino acids of chymotrypsin spontaneously form this one shape. As a chemist I find this incredibly strange and unlikely. Among the 10^48 conformation of a 100 amino acid protein are there none within 40 kiloJoules of ‘the’ structure? If there are, are the energy barriers so high that it is never found?

We’ve seen this happen so often we’ve gotten used to it, but speaking as a former chemist, I find this behavior incredibly strange. I probably know enough math now to really delve into the physical chemistry of protein folding, but haven’t gotten around to it yet,. But saying that proteins fall down a potential energy funnel seems (to me) like just a fancy way of saying they fold into one shape.

I mean you don’t even have to be a chemist to see what I’m talking about. Back in the day, girls used to wear charm bracelets, with little charms hanging of the chain. Some of the chains attract each other, others have the opposite effect. Make one with 100 charms of 20 different types, throw it into a pail of oil and agitate the pail so it doesn’t sink. Do you think just one shape would result?

I think our biochemical sense of wonder has been dulled by what we’ve found so far. For some thoughts on this see https://luysii.wordpress.com/2009/09/25/are-biochemists-looking-under-the-lampost/. Just this month [ Proc. Natl. Acad. Sci. vol. 107 pp. 17710 – 17715 ’10 ] A new player in bone formation was found. It’s oleic acid esterified to the hydroxyl group of serine. How many more things are there like this out there?

This has nothing to do with mutation, or the evolution of protein structure by natural selection. That’s for next time. But if proteins with one or a few structures are as rare as I think them to be, it’s going to be tough to get new proteins with this property from old ones by mutation. Once obtained, natural selection can go to work on them. The problem is getting to them in the first place.

There have been some great critical responses to some of the posts, which deserved a reply long ago. All the posts criticized involve either a chemical, molecular biological or numeric argument about the macromolecules making us up. Here they are in a semi-logical sequence. I’ll deal with the actual criticisms in the next post(s).

Two posts involve simple calculations about how many distinct proteins or polynucleotides life could have made given the mass of the earth to do so and 14 billion years. Here are the links. (1) https://luysii.wordpress.com/2009/12/20/how-many-proteins-can-be-made-using-the-entire-earth-mass-to-do-so/ (2) https://luysii.wordpress.com/2009/12/28/how-many-distinct-rna-polymers-can-be-made-using-the-mass-of-the-earth-to-do-so/.

No one has criticized the correctness of the calculations, which show that life on earth could have made only an infinitesmal fraction of the possible proteins of only 100 amino acids, or polynucleotides of 100 bases. If you disagree say so now. There has been severe criticism of the implication that evolution works by randomly trying out all such possibilities. I didn’t really say that, and will deal with this in the next post. I do think that all of us agree that mutations occur randomly (recombination hotspots excepted) so that the grist for the evolutionary mill is formed essentially willy nilly. If you disagree say so now.

What I was really getting at, is that I find the proteins which make us up rather miraculous in that (1) they have one or just a few conformations which give then a fairly stable shape — they certainly do or we wouldn’t be here. For details see https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/. (2) their side chains don’t react with each other. For details see https://luysii.wordpress.com/2010/05/13/protecting-groups/.

I think proteins with such magical properties are exceedingly uncommon. So how would you know how common such proteins actually are ? While possible in theory, the experiment to investigate the structures of a random sequence of amino acids is impossible to carry out fully. For details see htts://luysii.wordpress.com/2010/08/08/a-chemical-gedanken-experiment/. It still might give an answer if nearly every random sequence of say 60 – 100 amino acids had just one or a few structures.

I’m unimpressed with the argument that there are only 1000 or so protein folds, which significantly narrows the search space. There are huge numbers of proteins in the microorganisms living in the sea which far outnumber what we’ve already studied. Even if correct, how would random mutation find them? I’d love to see the results of the ‘glass eye’ experiment — for details see https://luysii.wordpress.com/2009/11/29/time-for-the-glass-eye-test-to-be-inserted-into-casp/

Finally, I must admit that these speculations provided a certain degree of comfort as I watched patients I was unable to help get worse and worse and finally die. For details see –https://luysii.wordpress.com/2009/09/17/the-solace-of-molecular-biology/. If our existence is as miraculous as I think it to be, then what really needs to be explained is not suffering and disease, but health and the gift of life. At long last, a semi-answer to Camus “The Plague” which affected me profoundly as an undergraduate years ago.

On 29 July, Derek Lowe had a short post about Craig Venter (http://pipeline.corante.com/archives/2010/07/29/craig_venter_venting.php), along with short quote with by Venter describing Francis Collins as a government administrator rather than a scientist, presumably because of Collins’ religious beliefs. It drew some 76 comments as of today. Most of the comments concerned whether religion and science were compatible or not.

@retread: Interesting how some people are so happy to trot out combinatorial complexity arguments to dismiss the possibility of proteins arising through evolution (especially naive, error-filled ones that ignore the fact that it is not random but directed, that many functional proteins consist of repeated sub-groups, that many proteins share functional domains, and so on, all assumptions which prune the combinatorial tree by dozens of orders of magnitude), and yet do not blink at invoking the existence of an omniscient, omnipotent being of infinitely greater complexity to create these complex proteins …

Something about swallowing camels while straining at gnats springs to mind.

#42 Daen: I am far from happy to trot out combinatorial arguments to dismiss the possibility of the present degree of protein complexity and structure arising by chance. I find many of the uses religion has been and is being put to absolutely horrible. I do not like where my arguments seem to lead. They need to be refuted (but I don’t see how).

You need far more than ‘dozens of orders of magnitude’ to trim down protein space so all aspects of it can be explored. The current champ is titin with 30,000 amino acids, 300 modules of three types (1) immunoglobulinlike, (2) type III fibronectin, and (3) unique PEVK insertions. Even linking them together in any particular order is one in 3^300 possibilities, a number larger than all the baryons in the universe.

Only 1.5% of the genome codes for amino acids, but nearly all of it is transcribed, so proteins are only a small part of the story. Molecular biologists are fixed on proteins (they know lots about them, and the technology to study them has been developing for decades). But there is far more to the story. For just how protein-centric molecular biologists are see the current post about Autism Spectrum Disorder.

****

Since then we’ve had an example of the good and evil to which religion can be put, an example so perfect that I could never have made it up — the slaughter of 10 medical workers in Afghanistan (in the name of religion of course).

Daen is right. I thought we had already made headway into addressing the combinatorial arguments against protein structure and function. Once we accept the co-operative nature of self-assembly, things begin to look much more reasonable. Even a computer program like Rosetta (which is considered state-of-the-art as far as predicting protein folding is concerned) can pare down the vast space of possible protein folding intermediates to a manageable few by using well-established motifs from known protein structures. If this can be done in a few hours by a computer program for a decent-sized protein, I don’t see why it would require an act of faith to believe that nature could implement such a strategy over billions of years.

#47 Wavefunction: Of course Rosetta can do this. It starts with proteins which already are known to fold into one shape, to find the how another protein (which is known to have one shape) folds into it. Rosetta is basically starting with the answers in hand, and a question which is known to have an answer.

@retread: You’re missing the elephant in the room, which is so often overlooked by those who invoke a purely combinatorial approach to questions of how functional biological systems arise. The elephant is that all proteins do something useful, which is non-random. A naive combinatorial approach based on pure random chance does not take into account the equally sound physical principles of natural selection, which is anything but random. An organism alive today exists in a state of extreme adaptation, from its gross morphology down to its molecular biology. Working backwards, at every step of the way, its ancestors survived. Mutations conferring an adaptive survival advantage upon those ancestors can be traced backward, generation by generation. Other mutations, which may have been deleterious or which did not confer sufficient advantage, have been lost. Surely you know this; it is at the heart of the modern evolutionary synthesis. So to invoke a pure random chance argument and express surprise at the vast numbers it throws up is incongruous and, worse, plain wrong. Your argument is utterly specious.

@retread: For a simple concrete example of how good solutions to problems that are seemingly infinite can be generated “randomly” simply Google genetic algorithm solutions to the travelling salesman problem.

Briefly, if a salesman has to travel between multiple cities and you want to know the best (i.e. shortest) way to do it, once you consider a rather trivial number of cities you are considering, if done exhaustively, more possible paths than there are atoms in the universe. Yet, using “random” selective methods (such as GAs) you can have Excel generate good solutions in a matter of minutes.

Perhaps this implies that God is somehow, in a divine, intelligent way, extending his mighty hand into Microsoft’s product. A more likely explanation is that invoking combinatoric arguments, without truly understanding combinatorics, is not the way to refute the conclusions of thousands of man-years of consistent evidence.

@Retread: The point was that Rosetta uses a mix-and-match strategy which makes the conformational space required to be searched much smaller than what would result from random search alone. Nature proceeds in a similar way, non-randomly accumulating pre-existing fragments from known protein structures. It would indeed be miraculous if it were purely random. But it’s really not, and this argument is quite well-trodden.

Using another guy’s blog for the back and forth about this question, didn’t seem quite kosher, so I’ve put up two of the posts I wrote for the Skeptical Chymist (they are the previous two on this blog) which explain my thinking behind my original comment. How life came into being is one of the most profound questions we can ask. Even though presumably scientific, there is no way it can be disentangled from its theological and philosophic implications. Aren’t we fortunate to live when we live, know the chemistry and physics that we know, and possess some of the data needed to address it on a nonintuitive basis.

So start your engines and comment away (either on the previous two posts or this one). I’ll eventually respond to all of them, but it may be a while, as on the 15th I leave for “Band Camp for Adults” for a week.

We have 3 RNA polymerases which transcribe DNA into RNA. Transcription starts at the 3′ end of one of the members of the DNA helix and proceeds toward the 5′ end. However the RNA produced starts at the 5′ end and proceeds toward the 3′ end. Why transcribe you might ask? Because the chemical language is the same — DNA and RNA are both polynucleotides. The Guanine in DNA codes for Cytosine in RNA, etc. etc.

RNA polymerase I (Pol I to you) transcribes the genes for the RNA found in the ribosome (ribosomal RNA also known as rRNA), RNA polymerase II (Pol II) transcribes the genes for proteins into messenger RNA (mRNA), while RNA polymerase III (Pol III) transcribes the genes for transfer RNA (tRNA) and a lot more. Med students love mnemonics, so here’s one — I makes rRNA, II makes mRNA, III makes tRNA — so the polymerases and the products are in (semi) alphabetical order.

The ribosome is an incredible molecular machine — it contains several RNAs (called rRNAs) containing in total about 4,500 nucleotides and about 50 proteins. The molecular mass is about 2,500,000 Daltons. Its job, and its only job as far as we know is to translate the mRNA into protein. Why translate? Because polynucleotides and proteins are chemically quite different. So information is being translated from one language to another. Transfer RNAs (tRNAs) are involved. Each different tRNA brings a just one specific amino acid to the ribosome, which then stitches the amino acid to the growing protein. Since we have 64 possible codons for amino acids (that’s 4^3), we have an abundance of tRNA genes in our DNA, well over 400.

Now it’s time to speak of mRNA or, actually, pre-mRNA. The previous post noted that most genes come in pieces, parts coding for amino acids (called exons) and parts between the exons, called the introns. Pol II knows nothing of them, just as the CPU knows nothing of the series of bits it is fed in a program. It just starts transcribing DNA at a certain point, making mRNA willy nilly, intron and exon and finally quiting.

As mentioned in the previous post, dystrophin has over 2 million nucleotides in its DNA, all of which are transcribed into RNA. The parts of the RNA actually coding for amino acids is under 15,000 nucleotides long, so all the introns must be spliced out. This is the function of the spliceosome — another huge molecular machine. It contains 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons. Splicing out introns is a tricky process which is still being worked on. Mistakes are easy to make, and different tissues will splice the same pre-mRNA in different ways. All this happens in the nucleus before the mRNA is shipped outside where the ribosome can get at it.

There are some incredible fail safe mechanisms here. The spliceosome associates a few proteins with the spliced together exon/exon junction, so that if and when the mRNA is read (translated) by the ribosome, if a termination codon occurs too early in the gene, truncating the protein prematurely, a process called nonsense mediated decay destroys the defective mRNA.

The mature mRNA just before it is ready to leave the nucleus has several parts. From the 5′ end it has a bunch of nucleotides prior to the first codon for the protein (always an AUG which codes for methionine). This is called the 5′ UnTranslated Region (5′ UTR). U, by the way, stands for Uridine which is the nucleotide in RNA corresponding to thymine in DNA. Then there is the protein coding part, then there is the 3′ part which is not translated into protein (called the 3′ UnTranslated Region, 3′ UTR). When Pol II is finished translating the gene, a long stretch of adenines (polyAdenine aka polyA) is added somewhere in the 3′ UTR. It is added about 30 nucleotides downstream (3′ to) an AAUAAA sequence found in the 3′ UTRs of most protein coding genes. There are some 20 – 260 adenines in a row in the polyA tract. Addition is important, as polyA protects the mRNA from degradation — very few things in the cell hang around forever. Each time the ribosome translates the mRNA into protein some adenines are lost, so for those of you familiar with computer programming, you can regard the polyA as a loop counter.

The 3′ UTR also contains sites where yet another type of RNA (called microRNA) binds. Genes for microRNA are also transcribed by Pol II. Their precursor (pre-microRNA) is then extensively processed (I’ll spare you the gory details) to form mature microRNAs, which, as the name implies, are rather short — only 20 – 22 nucleotides. MicroRNAs represent one of the many forms of control on the amount of a given protein that a cell contains. They basepair with complementary sequences in the 3′ UTR of mRNAs and either (1) inhibit protein synthesis of the mRNA by the ribosome or (2) cause degradation of the mRNA. It’s important to note that a given microRNA can control the levels of many different proteins, if the complementary region is present in their 3′ UTRs. Also the 3′ UTR of a given mRNA can have regions complementary to many different microRNAs.

That’s quite a bit to throw at you. I’ve omitted a lot of the complexity, to make the goings on as simple and clear as possible. Hopefully, I haven’t violated Einstein’s dictum “Everything should be made as simple as possible, but not simpler”. I think what I’ve said is quite accurate, but comments and corrections are always welcome.

The more I know about the goings on inside our cells, the more impressed I become, and the greater the leap of faith I must make to accept that this all arose by chance.