The Rational Essence of Proteins and DNA

In my previous essay about proteins-as-design-material, I noted:

This all raises some interesting questions. For example, without proteins, and their manufacturing process, what becomes of the blind watchmaker? Without proteins, and the latent functions contained within, might not the blind watchmaker exist as the impotent, crippled, blind watchmaker with no one to notice its existence? If so, how much credit does the blind watchmaker really deserve?

The vast and immense Tree of Life is a protein-dependent output. Point to some evidence of evolution and I’ll point to the proteins that underlie it. Without proteins, would there be a Tree of Life 3.5 billion years after the RNA world took root? How do we know? If we believe so, would the Tree be as immense and vast as it is today? A life form composed of nucleic acids, carbohydrates, and lipids would suffice for the purposes of the blind watchmaker. But could the blind watchmaker turn this material into something that is analogous to an Ash tree filled with squirrels, beetles, and birds?

Look at it this way. What do we need for the blind watchmaker to exist? A finite, changing world, something that replicates, and imperfect replication. The first and the third are givens due to the fabric of Nature. The second is more iffy. In living cells, proteins play the key role in replicating things (they replicate the DNA, they divide the cell, and coordinate both). But if we entertain the notion of an RNA world, the proteins are not needed for replication (then again, proteins are not needed for chemical reactions to take place). But what the proteins do is amplify and enhance this replication property, and thus enhance the blind watchmaker’s abilities. What’s more, the same molecule that enhances replication also opens up a whole vast world of phenotypes not available to the blind watchmaker earlier. You can almost think of proteins are a form of tech material designed to exploit and prop up the blind watchmaker. And maybe even give the blind watchmaker a little guidance.


To what degree is the design of a designer constrained by his/her building material? For example, imagine that we enlisted the service of the worlds most creative and brilliant engineers and tasked them to design a space craft that will carry men to Mars and back. Now, let’s add one constraint – the only material available to the designers is concrete. Would these brilliant designers be able to meet the design objective?

Or consider the computer. Today’s computers are more sophisticated than computers from the 1950s, allowing people to design programs that allow you and me to communicate with great ease and little cost. Why is it that programmers seem to be able to do more with computers today than they could in the 1950s? Is it because today’s designers are smarter than yesterday? Have new laws of nature been discovered? Or does it have something to do with an observation from Hartwell et al.?

An early stored-program computer (left), built around 1950, used vacuum tubes in logic circuits, whereas modern computers use transistors and silicon wafers (right), but both are based on the same principles.

While I myself am not an engineer, I do know that without the right building materials, I cannot design a tree house. I do not that without the right seeds, I cannot design a garden. Designers are limited and constrained by the building material (and tools) that are available to them.

Since natural selection can act as a designer-mimic, it too would share this feature and be subject to similar limitations.


Let me now summarize some of the observations I have made with my recent focus on proteins and their role in the success of evolution. Consider the following:

1. The entire Tree of a Life is a protein-dependent output. Evidence for evolutionary processes is evidence for a protein-dependent phenomenon. This calls into question any attempt to extrapolate evidence of this protein-dependent phenomenon to protein-less evolution.

2. Proteins are amazingly diverse building material, capable of performing an immense array of functions. We know of no other building material that is as versatile.

3. The immense versatility of proteins is coupled to a single manufacturing process known as translation. When you couple this with point #2, this speaks of an astounding elegance.

4. There is very no evidence to support the notion that protein-less evolution would be as successful as protein-dependent evolution.

5. Since designers are limited by their building material; evolution (as designer-mimic) is likewise limited by its building material. This consideration reinforces the importance of taking a closer look at evolution’s dependence on protein activity.


Since we have been talking about proteins, let’s back up to say a few things about their building blocks – the amino acids. Below is a figure of an amino acid.

Note the central carbon atom and how it is covalently bonded with four different groups. Three of these four groups are always the same in every amino acid used by life: the amino group (orange box), the carboxyl group (blue box), and the hydrogen atom. The R signifies the side chain, which differs for each amino acid.

Life uses 20 different amino acids, as shown in the table below:

In essence, this is the palette of amino acids that are available to the blind watchmaker. The different amino acids are broken into different groups because of the chemical properties of the side chains. Those in orange are the hydrophobic (oily) amino acids. They will be found mostly inside the core of a globular protein as they are most useful in determining the basic shape of the protein surrounded by water. The others are hydrophilic and can be used to decorate the surface of the protein, allowing for specific interactions with other proteins and molecules inside the cell. The purple group is negatively charged, the blue is positively charged, and the green is uncharged.

Now I have just argued that the blind watchmaker, like all designers, is limited by the building material that is available. So we need only ask ourselves a simple question – what if we cut down on the diversity of this palette? What if the blind watchmaker only had one member from each color-coded group to work with? What if the blind watchmaker had only the amino acids found in any particular color-coding (only hydrophobics or uncharged hydrophilics, for example)? What of the blind watchmaker only had three amino acids to work with? Say arginine, valine, and glycine? Or leucine, proline, and aspartate? Etc.

Could the blind watchmaker still produce a biosphere as diverse and resilient as that which exists with such scaled down palettes? I doubt this very much. After all, we could continue the thought experiments down to a single amino acid. Say that only glycine is available. If all proteins were simply chains of glycine, functional diversity would be gone, as they only thing that would differentiate the polypeptide chains is their length.

Those who would deny that the blind watchmaker is limited by its available building material would be in the absurd position of arguing a palette with one single amino acid would be just as useful as the current pool of 20 amino acids.

Of course, this raises an equally interesting consideration, namely, what if the palette contained many more amino acids, say 30 or 40? Would the blind watchmaker be any more successful? Would life contain adaptations and structures that have never been seen on this planet?

Let’s now proceed from amino acids to proteins.

To make a protein, we simply covalently link individual amino acids together via a peptide bond. The figure below shows the formation of a peptide bond.

I’d like to draw your attention to two things. First, note that the carboxyl group of one amino acid reacts with the amino group of the second amino acid to form the peptide bond (highlighted in the orange box). This creates a dipeptide with differing ends. At the N-terminal end, there is a free amino group and the C-terminal end has a free carboxyl group. This simply means we can attached a third amino acid to the C-terminal end of the dipeptide with the very same reaction. And if we can add a third, we can add a fourth. Etc. Thus, the structure of the amino acid is perfectly poised to create a growing chain whose length would be determined by factors other than amino acid structure. We can thus begin to catch a glimpse of one reason why proteins are so versatile, as the relative ease of construction is coupled to an ability to vary the length.

But let us now consider something that is even more interesting. Notice the two R groups, R1 and R2. These represent the side chains of each respective amino acid that we have already briefly discussed above. What you should notice is that side chains do not participate in the linking of amino acids. On the contrary, they simply stick out as appendages on the backbone chain of amino and carboxyl groups. And because they do not participate in this linkage, it means that the sequence of side chains is not chemically determined by the process of polymerization. In essence, it is programmable.

These exact same themes are represented in nucleic acids, RNA and DNA. In these cases, the building blocks are more complex and known as nucleotides.

Above is a figure of a nucleotide, showing its three parts: the pentose sugar, a phosphate group, and the nitrogenous base. The nitrogenous base is the analog of the amino acid’s side chain, only this time there are four types: uracil (U), guanine (G), cytosine (C), and adenine (A) in RNA and thymine (T), guanine (G), cytosine (C), and adenine (A) in DNA.

The nucleotides are linked together into a chain through interactions between the phosphate group of one nucleotide and the sugar of an adjacent nucleotide. We’ll call this covalent bond the sugar-phosphate bond and the resulting chain is shown in the following figure:

Notice that as with proteins, the ends are different. The top end contains a free phosphate group and is called the 5’ end (because that phosphate group is bonded to the 5’ carbon of the sugar). The bottom end with the exposed sugar is called the 3’ end. To grow the strand, we need to attach a fourth nucleotide to the 3’ end. Thus, the structure of the more complex nucleotide is perfectly poised to create a growing chain whose length would be determined by factors other than nucleotide structure.

But notice also the way the nitrogenous bases mimic the R groups of amino acids: they do not participate in the formation of the sugar-phosphate bond. On the contrary, they simply stick out as appendages on the sugar-phosphate backbone. And because they do not participate in this linkage, it means that the sequence of nitrogenous bases is not chemically determined by the process of polymerization. In essence, it is programmable.

Thus, what we have is a profound conceptual similarity between two unrelated molecules. It is this similarity in basic format that will allow the nucleic acids to encode the amino acid sequence of a protein.

We’ve seen that a protein is formed by covalently linking amino acids, yet in a fashion where the diverse side chains do not participate in this binding. This frees them to function elsewhere. So what do the side-chains do? In short, they interact with each other. Through electrostatic interactions, they fold most proteins into a compact, globular shape and it is the shape that is at the very heart of protein function (if you disrupt the shape, you disrupt the function).

What I’d like to do now is impress upon you the very brilliance of this design, as it goes a very long way in explaining why proteins have been so useful for evolution.

What you have here is a strategy that links subunits by covalent bonds, but the folding, and thus function, is determined by forces much weaker than covalent bonds.

As Joachim Pietzsch notes,

the folding of a protein is not a chemical reaction, with a bond breaking here and a new one forming there. It is more like the weaving of an intertwined molecular pattern, the stability of which is defined by innumerable forces between atoms.

In his classic book, Chance and Necessity, Jacques Monod explores the implications in more detail as he explores the difference in activation energy when forming covalent bonds and noncovalent bonds:

Simplifying somewhat, and specifying that we are now considering only those reactions occurring in aqueous phase, we may say that the average amount of energy absorbed or liberated by a reaction involving covalent bonds is on the order of 5 to 20 Kcal per bond. For a reaction involving noncovalent bonds only, the average amount of energy would be between 1 and 2 Kcal.

This considerable difference partially accounts for the difference in stability between covalent and noncovalent chemical constructs. The essential, however, lies not there but in the differences in the so-called activation energies brought into play in the two types of interactions.


Now- and this is the crucial point – in general:

a. The activation energy of covalent reactions is high; their speed is therefore very slow or zero at low temperatures and in the absence of catalysts; while

b. The activation energy of noncovalent reactions is very low if not zero; they therefore occur spontaneously and very rapidly, at low temperature, and in the absence of catalysts.

The result is that structures defined by noncovalent interactions can attain a certain stability only if they entail multiple interactions. Furthermore, noncovalent interactions acquire a notable amount of energy only when atoms lie a very short distance apart, practically “touching” one another. Consequently two molecules (or areas of molecules) will be able to contract a noncovalent association only if the surfaces of both include complementary sites permitting several atoms of one another to enter into contact with several atoms of the other.

If we now add that the complexes formed between enzyme and substrate are of noncovalent nature it will be seen why these complexes are necessarily stereospecific: they can form only if the enzyme molecule has a site “complementary” to the shape of the substrate molecule.

So what does all this mean? We link amino acids together in a process that will depend on catalysts (explaining why proteins depend on the molecular machine known as the ribosome for their origin). This speaks to stability. But what is stably linked together? A pattern of side chains that has the potential to spontaneously adopt a three-dimensional shape that is, in essence, programmed by the sequence. The one-dimensional “virtual world” codes for the emergence of a three-dimensional world, where form becomes function. And it does so in a way that imparts both specificity (the need for multiple interactions) without determinism (the whole system is dynamic, thus flexible, thus responsive). With building material like this, the blind watchmaker could not help but be a success!

But let’s next turn back to DNA, the other biological molecule that shares some rather deep conceptual similarities with proteins. Could proteins and DNA be “a match made in heaven?”

We’ve seen that the logic of protein structure entails the covalent linkage of a pattern of noncovalent interactions. This is how we encode a three-dimensional reality in one-dimensional terms. And all of this was made possible by the fact that amino acids are linked together in a way where their side chains were not involved in the linkage and thus served more like appendages.

But we have also seen this very logic is at play when it comes to the formation of a chain of nucleotides. As with the side chains of amino acids, the nitrogenous bases can interact with each other through noncovalent forces causing the nucleotide chain to fold into a three-dimensional structure. This is what happens with a lot of RNA and explains its ability to function as a catalyst. But let’s turn to DNA.

With DNA, two nucleotide chains, running in opposite directions, form the well-known double-helix. The thymines on one strand hydrogen bond with the adenines on the other strand, while the guanines hydrogen bond with the cytosines. But it’s more than this. If we look down the end of the DNA double helix, we’d see something similar to the picture shown below.


You can notice one set of base pairs highlighted in white. But as the two strands wind around each other, note how the base pairs stack on each other (the inner circular structure) while the sugar and phosphate groups surround them. This is because the bases are hydrophobic and are thus shielded from the water by the surface sugar/phosphates. And what this means is that it is hydrophobic forces that drive the two strands together, where the hydrogen bonds simply add an additional layer of stabilization coupled to specificity.

The same logic thus applies when forming the double helix of DNA and the folded protein. Both are linear molecules, where a particular sequence is connected together by covalent bonds. Both have appendages that that interact with each other via a pattern of noncovalent forces. Hydrophobic forces collapse a protein into a compact structure and other electrostatic forces impart further stability and specificity. Hydrophobic forces drive the double helix together and other electrostatic forces impart further stability and specificity. Globular proteins have a hydrophobic core and a hydrophilic surface. The DNA double helix has a hydrophobic core and a hydrophilic surface. And perhaps most remarkable is that Monod’s observation equally applies to both:

The result is that structures defined by noncovalent interactions can attain a certain stability only if they entail multiple interactions. Furthermore, noncovalent interactions acquire a notable amount of energy only when atoms lie a very short distance apart, practically “touching” one another. Consequently two molecules (or areas of molecules) will be able to contract a noncovalent association only if the surfaces of both include complementary sites permitting several atoms of one another to enter into contact with several atoms of the other.

In essence, this is variation on the same logical theme. Same story; different play.

The main differences are twofold: 1.) The pattern of “touching” in the DNA molecule is what codes for the pattern of “touching” in a protein; 2) the sequence of bases on one strand of DNA are complementary to the other strand, meaning the pattern of “touching” in the DNA molecule also “codes” for efficient replication. As I write in my book, The Design Matrix:

The fact that DNA exists as a double helix of two nucleotide chains foreshadows the manner in which the structure of DNA is perfectly suited for replication. To replicate DNA, all you need to do is unwind the two strands and then use each strand as a template for the synthesis of a new, complementary strand. In one molecule, there are two perfect solutions for two design problems—coding the machinery of life and perpetuating the information across time. A more beautiful molecular expression of the form-function relationship would be hard to imagine. Seen from this vantage point, the very structure of DNA is evidence that indicates life was designed to reproduce.

The differences thus complement the similarities and become one.

But what happens when the protein theme meets the DNA theme? What happens when the proteins “touch” the DNA?

Let’s look more closely at the building blocks of DNA – the nucleotides.

Notice that it is more complex than an amino acid, where three complex chemical groups are covalently linked together. And unlike amino acids, nucleotides are not recovered in Miller-Urey type experiments. In fact, Robert Shapiro, professor emeritus of chemistry and senior research scientist at New York University, notes:

And no sample of a nucleotide, the building block of RNA or DNA, has ever been discovered in a natural source apart from Earth life. Or even take off the phosphate, one of the three parts, and no nucleoside has ever been put together. Nature has no inclination whatsoever to build nucleosides or nucleotides that we can detect, and the pharmaceutical industry has discovered this.

What is also remarkable about nucleotides is that it is possible to connect the sugar, phosphate group, and nitrogenous base together to form different structures.

In their book, The Mystery of Life’s Origins, Thaxton et al. calculate there are 45 different isomers of the above nucleotide. And when it comes to forming a dinucleotide, shown in the figure below, Thaxton et al. calculate 720 different ways of connecting things together.

Yet life only uses the particualr arrangements as shown in the above figure.

This particular “arbitrary” arrangement that is not readily produced by Nature has a remarkable implication when it comes to forming the double helix and the pairing of bases on opposite strands. Once again, let’s look down the double helix, where we can see one base pair highlighted in white.


As a consequence of the asymmetric arrangement of groups in the two nucleotides, note that the distance from one edge of the double helix to the other edge is greater just above the base pairs than it is just below the base pairs. It might be easier to see this if we just focused on one base pair, as in the figure below

Note that the greater distance is called the “major groove face.” Because the two strands of DNA wind around each other, we can think of these base pairs as rotating as we move downward. This means that the double helix will have asymmetrical grooves on opposite sides – a wider one known as the major groove and a more narrow one known as the minor groove.

We cannot see the grooves by looking down the axis of the DNA but we can see them when looking at DNA along it’s length as seen in the figures below.

All of this means is that the sequence of nitrogenous bases are more accessible in the major groove. In fact, there are very common DNA-binding protein motifs known as zinc fingers, leucine zippers, and helix-turn-helix that bind to the DNA by “touching” the bases within the major groove.

So let us now turn to the proteins. There are two basic rudimentary folds that occur – the alpha helix and the beta sheet. Let’s focus on the alpha helix since it is the fold from the various DNA binding proteins listed above that binds the major groove . A figure of an alpha helix is shown below:

This helical structure is stabilized by hydrogen bonds (the dotted lines) between the C=O group of one amino acid and the N-H group of the amino acid four subunits downstream (as more easily seen in the figure below):

There are a few things to note about the alpha helix. Although side chain identity can influence whether or not an alpha helix is formed, it forms largely by the same groups that are involved in the peptide bond. Second, short peptides do not normally form alpha helices, which means this is a fold that will effectively emerge from larger chains of amino acids. Third, note that the side chains extend outward from the helix, in contrast to the way the nitrogenous bases point inward in the DNA helix. Below shows a figure where the outward pointing side chains (in green) are more easily seen (especially c and d):

Thus, as a consequence of amino acid structure, proteins will not only form folded structures using the same rules that form the double helix of DNA, , but they will form a cylindrical structure whose appendage (side chains) seem to be well-matched for scanning and binding to the winding major groove along the double helix. That is, the pattern of outreaching side chains can reach into the major groove and interact with the pattern of base-pairs inside the wide crevice of the major groove. But how well matched is the alpha helix and major groove?

According to this article:

alpha helices have particular significance in DNA binding motifs, including helix-turn-helix motifs, leucine zipper motifs and zinc finger motifs. This is because of the structural coincidence of the alpha helix diameter of 12Å being the same as the width of the major groove in B-form DNA.

Why think it is merely a coincidence that the alpha helix diameter and the width of the major groove are the same? On the contrary, it simply enhances and extends the inherent rationality and complementarity that lies behind these two crucial biological molecules – a match made in heaven.

2 responses to “The Rational Essence of Proteins and DNA

  1. What about the orientation of amino acids? Virtually all amino acids on earth are left handed. If there was a mixture of both, which theoretically there should be, proteins would not be able to function. I have not heard a really convincing reason for the absense of right handed amino acids. Is it possible that the watchmaker needed to eliminate them to make life possible?

  2. Hi Alaninnont,

    Yes, chirality is another problem for abiogenesis. There are all kinds of speculations about its origin, each with some circumstantial evidence, but nothing that is powerfully persuasive.

    My interest in this essay is different, however. Even if abiogenesis did occur, the trace of design would not disappear, as the uncanny conceptual similarities between proteins and DNA would remain.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s