An Exceedingly Exceptional Code

I’m not sure how I missed this one. Recall that only one of a million randomly generated codes was more error-proof that the genetic code used by life. Well, in turns out the frequency of amino acids used by all three domains of life is much the same. And when you factor for this frequency of amino acid use, the genetic code is actually much better than “one in a million”:

We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code’s structure, we estimated that around two random codes in a billion (10^9) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal.

[Gilis D, Massar S, Cerf NJ, Rooman M. 2001. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2(11):RESEARCH0049]

However, there  is a crucial caveat:

But we have to keep in mind that there exist 20! ≈ 2 × 10^18 possible codes preserving the codon block structure, which means that we can expect about 10^9 better codes overall [47]. Moreover, if the codon block structure is not preserved [46], the number of possible codes is larger by orders of magnitude, and therefore the number of codes better than the natural one will certainly be much larger.

We’ll talk about this some more, but for now, pay close attention to the following argument from the paper:

Although the code still evolves today, as reflected by its departure from universality in some organisms, its evolution is very limited and concerns only the reassignment of a few codons [2]. As the same change sometimes recurs in different lineages, the code seems to have reached the bottom of a funnel in the evolutionary landscape that contains several roughly equivalent optimal codes. But apart from such restricted modifications, the code no longer evolves significantly, and has not undergone important modifications since an early stage in the development of life. This stability probably arose because even small modifications in the code would have entailed loss of functionality of genes that were already being expressed. Moreover, the advent of more sophisticated transcription/translation control mechanisms, which involve huge protein systems, could have decreased the evolutionary pressure on the genetic code. Even though our present information on the genetic code is insufficient to discriminate between evolutionary scenarios, our analysis enables us to put some constraints on the situation at the time when evolution of the code was pretty much frozen. In particular, it appears that the frequencies of the amino acids that were used in proteins synthesized at that time were similar to the present frequencies. We do not know what determines the present amino-acid frequencies, but presumably they result, at least in part, from the amino acids’ physicochemical properties. For instance, the ratio of hydrophobic to hydrophilic amino acids is intrinsically related to the globular structure of proteins and certainly contributes to the pressure on amino-acid frequencies. Also, amino acids that are easily synthesized may be used more often. Thus, we can assert that some of the pressures that determine the present amino-acid frequencies were already present at the time the code took on its definitive form. In addition, the increased optimality of the genetic code with respect to gmutate implies that the three-dimensional structure of proteins probably played an equally important role in fixing the structure of the code. As the three-dimensional structure of a protein essentially determines its function, this suggests, more generally, that the protein function acted as a main evolutionary pressure on the code structure. Consequently, at the time when the genetic code took its present form, primitive life was presumably already synthesizing complex proteins. This provides a tentative picture of primitive life at that time: the translation apparatus was similar to the present one, and organisms were made of complex proteins whose amino-acid frequencies were comparable to the present ones.

If you are a good investigator with an eye for subtle clues, you will have picked up two important points.

1. Our hunt for LUCA continues to turn up entities not much different than that which exists today: a tentative picture of primitive life at that time: the translation apparatus was similar to the present one, and organisms were made of complex proteins whose amino-acid frequencies were comparable to the present ones.

Of course, if you think about it, it does seem a little odd to propose that the genetic code evolved around organisms composed of complex proteins whose amino-acid frequencies were comparable to the present ones. One might think that the genetic code would be a needed precondition for organisms composed of complex proteins whose amino-acid frequencies were comparable to the present ones.

2. The pattern entailed by this evolution is quite curious: the code seems to have reached the bottom of a funnel in the evolutionary landscape that contains several roughly equivalent optimal codes. But apart from such restricted modifications, the code no longer evolves significantly, and has not undergone important modifications since an early stage in the development of life. As I commented earlier:

Why did this bottleneck event occur after the Code became “chilled”, as the subsequent adaptive radiation (and all this involves) shows no solid trace of continuing with the evolution of the Code or adding more amino acids? It’s quite the coincidence to have the most extreme bottleneck in history occurring just after the genetic code finished evolving.

Let’s add some things in the next posting.

Advertisements

One response to “An Exceedingly Exceptional Code

  1. Mike,
    There is the possibility that DNA is not a code. It could explain this issue about error-proof.

    If you put billions human beings aligned together, do you have a code? No, you will have a billion copies with derivations from a single system. This is the DNA.
    Each horizontal pair of nucleotides is a working system, DNA is merely a pile of derivatives copies of the sytem-matrix.
    If you want to see how nucleotides are as working system, the modelis at Matrix/DNA Theory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s