2
o Biological codes are information channels or maps with natural ‘fitness’ measure. o Codes are evolved and selected according to their fitness or ‘smoothness’. o The emergence of a code is a phase transition in an information channel. o Topology of errors (noise) governs the emergent code.

8
Biological codes evolve(d) to cope with inherent noise Messages are written in molecular words that are read and interpreted by other molecules, which calculate the response etc… Typical energy scale ~ a few k B T. Thermal noise → errors. Information channels adapt to errors through evolutionary of selection-mutation Some errors = mutations are essential to evolution …

10
Fitter code is one with less distortion The ‘error-load’ H measures the difference between desired and the reproduced amino-acids. H is a natural measure for the fitness of the code. For better codes the encoding U and the decoding V are optimized with respect to the reading W. The decoded amino-acids must be diverse enough to map diverse chemical properties. However, to minimize the impact of errors it is preferable to decode fewer amino-acids.

11
Theories on the origin of the code: Frozen accident or optimization? Frozen accident hypothesis: Any change in the code affects all the proteins in the cell and therefore will be too harmful: Life began with very few amino- acids. New amino-acids were added until eventually the code became frozen in its present form. [Crick 1968] Load minimization hypothesis: Darwinian dynamics optimize the code to minimize errors in information flow (due to mutations, misreading). [Sonneborn, Zuckerkandl & Pauling… 1965]

12
Variant codes - evidence for ongoing optimization of the code Variants of the “universal” genetic code in many organisms [Osawa, Jukes 1992]. All variants use the same twenty amino-acids (universal invariant?) Continuity - Most changes are to a neighboring amino-acid. (‘hydrodynamic’ flow ?)

13
o Biological codes are information channels or maps with natural ‘fitness’ measure. o Codes are evolved and selected according to their fitness. o The emergence of a code is a phase transition in an information channel. o Topology of errors (noise) governs the emergent code.

14
Codes compete by their error-load One letter change in DNA can change one amino acid in one protein. If the new amino acid is similar to the original the upset is minimal. The organism with the smallest error-load takes over the population. - relatively small population - high noise levels in protein synthesis weak selection forces « random drift

17
o Biological codes are information channels or maps with natural ‘fitness’ measure. o Codes are evolved and selected according to their fitness. o The emergence of a code is a phase transition in an information channel. o Topology of errors (noise) governs the emergent code.

19
Why twenty amino-acids? Code is the mode u αi that minimizes the free energy. This mode corresponds to the maximal w - eigenvalue. Knowledge of w at the phase transition yields code. What can we say without such knowledge? (Why 20?) More amino-acids more sensitivity to errors. Fewer amino-acids reduce functionality of proteins. Historical mechanisms : Freezing, Biosynthetic etc.. Twenty as a topological feature of generic evolutionary phase transition?

20
o Biological codes are information channels or maps with natural ‘fitness’ measure. o Codes are evolved and selected according to their fitness. o The emergence of a code is a phase transition in an information channel. o Topology of errors (noise) governs the emergent code.

24
The genetic code has a spectrum u αi is average preference of codon i to encode α. Every mode corresponds to an amino-acid -> number of modes = number of amino-acids. Misreading w is actually the graph Laplacian w = -(Δ-Δ random ) where Δ ij =-W ij Δ ii =Σ j≠i W ij Δ measures the difference between codons and their neighbors, a natural measure for error load. Maximal mode of w is the 2 nd eigenmode of Δ Courant’s theorem: u αi have a single maximum -> single contiguous domain for each amino-acid.

26
Coloring number of graph code is an upper limit for the number of amino-acids What is the minimal number of colors required in a map so that no two adjacent regions have the same color? The coloring number is a topological invariant and therefore a function of the genus solely. Heawood’s conjecture [Ringel & Youngs, Appel & Haken]

28
Summary The 64 3-letter triplet code is patterned and degenerate, maps only 20 amino acids. The governing evolutionary dynamics is interplay between protein diversity and error penalty described by stochastic diffusion equation. The 1 st excited state of this diffusive mapping dynamics on the high-genus surface of the code yield a pattern of ordered 20 amino acids (20 = the coloring number of the graph). Topology + dynamics  Coloring (?)

29
Transcription network is a code that relates DNA sites and binding proteins Reading DNA to synthesize proteins is controlled by a system of protein-DNA interactions (transcription net). Presence/absence of transcription factor may repress/enhance synthesis of protein from nearby gene. The transcription network is actually a code that relates proteins with their DNA targets. Like the genetic code, transcription is subject to evolutionary forces and adapts to minimize errors. Pol TF DNA

32
???? Why does the code exhaust the coloring limit? Other population dynamics models (‘quasi-species’) Glassy 'almost-frozen' dynamics? The necessity of the wobble (64/48)? 25 acids? Generic phase transition scenario that does not depend finely on missing details of the evolutionary pathway. Although not much is known about the primordial environment, minimal assumptions about the topology of probable errors can yield characteristics of biological codes. Esp. the number of twenty amino-acids in the present picture is reminiscent of a 'shell magic number‘.