I have argued that the genetic code may have been designed to exploit the mutagenic bias that exists as a consequence of cytosine deamination. In my original analysis, I noted that the genetic code uses cytosine deamination to channel mutations such that they sample from an pool of amino acids that is almost exclusively hydrophobic (IHE or Increasing Hydrophobicity Effect). Furthermore, this pool is biased toward facilitating secondary structure formation. If coupled with carefully chosen initial proteomes, the potential exists that the first major evolutionary steps subsequent to the originally designed state where rigged such that the mutational bias could untap secondary designs that were front-loaded into the original state. For example, this might mean that something like the evolution of multicellularity (and perhaps more) was designed through this biased mutagenic effect.
The first step in testing this hypothesis is to determine whether the IHE plays out in evolution. Since the genetic code is universal, we might expect to see residual traces of this effect even if the sole intention of the design was to guide the first major evolutionary steps. Two extensive analyses have indeed uncovered the IHE in action (although neither paper makes the connection I did in my original analysis).
I’ll begin by discussing the first one: Gregory A. C. Singer and Donal A. Hickey. 2000. Nucleotide Bias Causes a Genomewide Bias in the Amino Acid Composition of Proteins. Molecular Biology and Evolution 17:1581-1588
Since evolution through cytosine deamination essentially entails replacing guanine and cytosine (G and C) with adenine and thymine (A and T), it would be interesting to compare the proteomes of GC-rich and AT-rich genomes. Luckily, this analysis has already been done by Singer and Hickey.
They begin by noting:
Some organisms, for example, have genomes that are disproportionately rich in guanine and cytosine (G and C), while others have DNA that is rich in adenine and thymine (A and T). Variation in nucleotide composition is usually most pronounced at the synonymous codon positions of genes, and, because of the redundancy in the genetic code, these variations in DNA content may have little effect on the amino acid content of the encoded proteins.
Singer and Hickey then partitioned the genetic code into GC-rich and AT-rich codons. They noted the AT-rich codons would encode phenylalanine, tyrosine, methionine, isoleucine, asparagine, and lysine (FYMINK) while the CG-rich codons would encode glycine, alanine, arginine, and proline (GARP). While the amino acid pools are not the same ones I identified in as the pre- and post-cytosine deamination codons (given the codons they looked at were enriched with AT or GC), there is an overlap, where the AT-rich codons contain mostly hydrophobic residues.
Singer and Hickey then looked at 22 completely sequenced genomes to determine if GC-rich genomes would have proteins that are enriched with GARP amino acids and AT-rich genomes are enriched with FYMINK amino acids. This is exactly what they found.
They took a closer look at Borrelia burgdorferi and Mycobacterium tuberculosis , which have a 25.5% and 65.9% GC content, respectively. These thus represented the two ends of the extreme. They compared 305 genes common to both organisms and measured the synonymous nucleotide frequencies and amino acid contents of each one. They found, “For every gene, the GARP/FYMINK ratio in the M. tuberculosis homolog was higher than that of the corresponding gene in B. burgdorferi”.
The authors conclude, Our main finding is not just that protein composition is affected by nucleotide bias, but also that this effect is both very large and very widespread. In fact, they observe:
When we plotted the relationship between nucleotide bias and amino acid content for the entire set of genomes examined, we were surprised to see that there was no “clumping” of major phylogenetic groups in these graphs and that the archaeal and eubacterial genomes behaved as a single homogenous data set. Moreover, the yeast genome data also fell at the predicted point on these graphs. This suggests that the effects of nucleotide bias on protein composition are operating in all major lineages.
Thus, if nucleotide bias does in large part determine protein composition, and nucleotide bias can be tweaked by cytosine deamination, it becomes clear the Increasing Hydrophobicity Effect I described could very well play out in evolution and thus be a component of the design mechanism.
They end their article with two very interesting observations.
The most parsimonious explanation of the observed patterns of amino acid composition in these genomes is an underlying mutational bias that varies between lineages. The resulting amino acid sequence changes are nonrandom, since the mutational bias is strongly directional, and yet they are not caused by natural selection acting directly on protein function. Consequently, their evolutionary dynamics cannot be described in terms of either Darwinian selection or random genetic drift. They may, however, result in secondary selective changes in the protein sequence. For example, amino acid bias could result in a change of the charge distribution within a protein, as well as an alteration of the protein’s secondary and tertiary structures. Such proteins may then undergo positive selection at other sites to counter the potentially deleterious effects of these nucleotide bias-induced changes. The long-term result might be a cascade of compensatory changes to reduce the impact of amino acid bias on protein structure and function. The problem of distinguishing between functional constraint in protein sequences and mutation-driven biases in the composition of these same sequences will provide a future challenge for molecular evolutionists.
They also note:
In conclusion, we recognize that other factors, such as selective constraint, adaptive change, and genetic drift, all play important roles in protein sequence evolution. The results presented here, however, demonstrate that mutational pressure on DNA composition can also be a very powerful and pervasive force in long-term protein evolution.
Such mutational pressure could very well help to unlock buried designs in front-loaded states.
Another implication of all this concerns convergent evolution, which is well situated in the hypothesis of front-loaded evolution. The authors offer their own take:
This result has implications for many studies that are based on the interpretation of amino acid sequence data. For instance, it has already been shown that nucleotide bias can affect the functional properties of proteins and that convergent amino acid composition can affect the construction of phylogenetic trees based on protein. For instance, Foster and Hickey showed that unrelated taxa were grouped together in a phylogenetic tree due to convergent amino acid sequences. Although this problem in phylogenetic reconstruction has been identified, a satisfactory method of dealing with the problem has not yet been found. Because of the relationship between primary amino acid sequence and secondary protein structure, nucleotide bias may also affect the evolution of protein structure .
This raises the fascinating question of whether any examples of convergent molecular evolution involved CT transitions at key points .
Another way to detect the IHE would be to analyze the effects of RNA editing that exploited cytosine deamination. During such editing processes, the synthesized RNA molecule is altered such that specific bases are changed through the use of cellular machinery. As a result, the RNA that is used by the cell is not directly encoded in the genome. A focus on RNA editing would allow us to see the effects of cytosine deamination all at once, rather than being spread out across time through incremental evolution.
So let me now discuss the second article: Philippe Giegé and Axel Brennicke. 1999. RNA editing in Arabidopsis mitochondria effects 441 C to U changes in ORFs. PNAS 96, 15324-15329
Giege and Brennicke (G&B) identified 456 instances of RNA editing. All of the edits were C-to-U changes and 441 occurred in open reading frames. The editing was rather extensive, where one of 15 cytosines was changed to uracil. Yet the editing was not evenly distributed, as genes coding for complex I (of the electron transport chain) and cytochrome c biogenesis were edited at a frequency higher than others. The effects of the editing nicely illustrate the IHE. According to G&B:
RNA Editing Increases the Overall Hydrophobicity of Mitochondrial Proteins.
One of the potential consequences of RNA editing in mRNAs and the corresponding change of the specified amino acid could be a modification of the overall biochemical nature of the affected mitochondrial proteins. The general tendency of the effect of RNA editing in Arabidopsis mitochondria is to increase the proportion of hydrophobic amino acid codons. As an example, the three most frequent amino acid transitions (93 S to L, 80 P to L, and 47 S to F) all result in codons for hydrophobic amino acids. In the overall analysis of RNA editing in Arabidopsis mitochondria, 35% of the modifications are hydrophilic to hydrophobic, and 35% are hydrophobic to hydrophobic codon alterations. Only the 27 P to S codon transitions reverse the tendency by creating codons for hydrophilic amino acids from those for hydrophobic ones. In the 425 modified codons detected, 41.5% specify hydrophobic amino acids before editing and 84.9%, after editing (Fig. 1).Thus RNA editing increases the hydrophobicity of mitochondrial proteins.
Figure 1 from their study is shown below: