Back around 2002, I noted that the genetic code appears to funnel one of the most common base pair substitutions, the C-to-T transitions caused by deamination of the cytosine. Put simply, codons containing a C specified a wide range of amino acids, but when that C is converted to T, the new set of codons all converge on the most hydrophobic amino acids. The original analysis is found here.
To see this for yourself, the figure below represents a hydrophobicity scale for the 20 amino acids based on 47 published attempts to quantify hydrophobicity:
Now consider the effect of cytosine deamination using this scale:
Scale on left are the amino acids coded for by C-containing codons which is converted to scale on right by the deamination of those cytosines.
But what if we did the same analysis, but this time restrict our focus to the cytosines that are followed by guanines – the CpG sequences discussed here, given that such sequence is the most likely to exploit the effects of deamination?
I’m not sure how I missed this one. Recall that only one of a million randomly generated codes was more error-proof that the genetic code used by life. Well, in turns out the frequency of amino acids used by all three domains of life is much the same. And when you factor for this frequency of amino acid use, the genetic code is actually much better than “one in a million”:
We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code’s structure, we estimated that around two random codes in a billion (10^9) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal.
[Gilis D, Massar S, Cerf NJ, Rooman M. 2001. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2(11):RESEARCH0049]
I have argued that the genetic code may have been designed to exploit the mutagenic bias that exists as a consequence of cytosine deamination. In my original analysis, I noted that the genetic code uses cytosine deamination to channel mutations such that they sample from an pool of amino acids that is almost exclusively hydrophobic (IHE or Increasing Hydrophobicity Effect). Furthermore, this pool is biased toward facilitating secondary structure formation. If coupled with carefully chosen initial proteomes, the potential exists that the first major evolutionary steps subsequent to the originally designed state where rigged such that the mutational bias could untap secondary designs that were front-loaded into the original state. For example, this might mean that something like the evolution of multicellularity (and perhaps more) was designed through this biased mutagenic effect.
The first step in testing this hypothesis is to determine whether the IHE plays out in evolution. Since the genetic code is universal, we might expect to see residual traces of this effect even if the sole intention of the design was to guide the first major evolutionary steps. Two extensive analyses have indeed uncovered the IHE in action (although neither paper makes the connection I did in my original analysis).
If you have been paying attention, you might have notices a striking contrast in several of my previous postings about evolution and the genetic code. Let me see if I can make it more clear.
The genetic code employs three stop codons – UGA, UAA, and UAG. We have already seen that these codons are perfectly immune to the effects of cytosine deamination. In other words, the code buffers against mutations that will mistakenly produce elongated proteins by turning a stop codon into a sense codon (a codon that codes for an amino acid).
But another question arises – why are there three stop codons? Since one stop codon would be sufficient for the purposes of signaling termination during protein synthesis, why the extra two? What’s more, by having three stop codons instead of one, we increase the chance of having a nonsense mutation, where a sense codon is mutated into a stop codon. Nonsense mutations would thus produce truncated proteins. Such nonsense mutations are a problem for the cell, as evidenced by the need for an RNA surveillance system known as nonsense-mediated decay. So again, why not just use one stop codon?
We’ve seen that the genetic code channels cytosine deamination (one of the most common mutations) such that a rather random pool of amino acids is converted to a cluster of hydrophobic residues while at the same time is quite exceptional at buffering against deleterious mutations. But let’s consider the three termination codons that function as stop signals during the synthesis of proteins.
The three stop codons are as follows:
The first thing to note about all three stop codons is that none of them contain cytosine (C). In other words, these three are perfectly immune to cytosine deamination.
But remember that DNA is double-stranded. Is the complementary sequence on the other strand of DNA likewise immune to cytosine deamination?
The genetic code is universal. While there are some variants of the code, these variants, which tweak at the periphery, arose after the universal code was established. The universality of the code means that all of evolution has been under the constraint and influence of the genetic code.
The genetic code was originally thought to be a frozen accident. Murray Gell-Mann explains the concept as follows:
Now, most single accidents make very little difference to the future, but others may have widespread ramifications, many diverse consequences all traceable to one chance event that could have turned out differently. Those we call frozen accidents. I give as an example the right-handed character of some of the molecules that play important roles in all life on Earth though the corresponding left-handed ones do not. People tried for a long time to explain this phenomenon by invoking the left- handedness of the weak interaction for matter as opposed to antimatter, but they concluded that such an explanation wouldn’t work. Let’s suppose that this conclusion is correct and that the right-handedness of the biological molecules is purely an accident. Then the ancestral organism from which all life on this planet is descended happened to have right-handed molecules, and life could perfectly well have come out the other way, with left- handed molecules playing the important roles.
Yet this original explanation has been effectively falsified as scientists analyzed the code in more depth.
It has been argued that no engineer would have used cytosine as part of the genetic material because of its predisposition for deamination. But it’s exactly this predisposition that might cause an engineer of evolution to include it.
Life itself appears to have been designed to minimize errors. The universal nature of the proof-reading/repair machinery, the optimized genetic code, and the G/C:A/T parity code all converge on this point. Yet despite this design logic, there is the interesting fact that cytosine is especially prone to deamination, where the removal of its exocyclic amino group converts it into uracil (a base normally found in RNA). Uracil does not exist in DNA, thus it can be effectively detected and removed by repair enzymes. However, if not detected and repaired, it can base pair with adenine, meaning that it would specify adenine during DNA replication. In a subsequent round of replication, the adenine in turn would specify thymine. The bottom line is that spontaneous deamination of cytosine can lead to a base substitution known as a transition, where C is replaced by T (and G is replaced by A on the other strand of DNA). We might expect such mutations to be quite common, as the rate constant for cytosine deamination at 37 degree C in single stranded DNA translates into a half-life for any specific cytosine of about 200 years. In fact, such high rates of deamination led researchers Poole et. al to complain of “confounded cytosine!” 
We would thus seem to have two contradictory lines of evidence. On one hand, there is the growing list of evidence to support the hypothesis that error correction was an important principle guiding the design of life. Yet the incorporation of cytosine works against such efforts, given its predisposition to spark a mutation. In fact, Poole et al. go so far as to argue, “Any engineer would have replaced cytosine, but evolution is a tinkerer not an engineer.” From a design perspective, how might these contrary dynamics be reconciled? That is, given the emphasis on error correction, why would an engineer include cytosine?
Seth Shostak is the Senior Astronomer at the SETI Institute. We have seen that his description of SETI contains several points that converge with the approach that I take in The Design Matrix. Let me now make this even more clear with a posting by Steven Novella, who is a neurologist at Yale University School of Medicine.