Back around 2002, I noted that the genetic code appears to funnel one of the most common base pair substitutions, the C-to-T transitions caused by deamination of the cytosine. Put simply, codons containing a C specified a wide range of amino acids, but when that C is converted to T, the new set of codons all converge on the most hydrophobic amino acids. The original analysis is found here.
To see this for yourself, the figure below represents a hydrophobicity scale for the 20 amino acids based on 47 published attempts to quantify hydrophobicity:
Now consider the effect of cytosine deamination using this scale:
Scale on left are the amino acids coded for by C-containing codons which is converted to scale on right by the deamination of those cytosines.
But what if we did the same analysis, but this time restrict our focus to the cytosines that are followed by guanines – the CpG sequences discussed here, given that such sequence is the most likely to exploit the effects of deamination?
Scale on left are the amino acids coded for by CG-containing codons which is converted to scale on right by the deamination of those cytosines. Codons that all ended with C followed by a G ((–C)G) are excluded since all such base pair substitutions are silent.
Whoa. The IHE effect has been tightened up. Is this just a coincidental relationship?
Consider if we looked at the same changes using something known as the OMH scale. As this paper explained (see Figure A.1C.3), “the OMH scale (Sweet and Eisenberg, 1983) is a measure of how likely a given amino acid will be replaced by a different hydrophobic or “buried” amino acid in a protein. In effect, this scale is how evolution views the hydrophobicity of an amino acid.”
The green box shows the amino acids coded for by CG-containing codons which is converted to the set in the red box by the deamination of those cytosines.
Is this just a coincidental relationship? Remember, as this paper notes, “Methylated cytosines can easily be converted to thymine residues via deamination and this mutational process has the highest rate among all base substitutions.”