RecA – The Evolution Gene


In his book Endless Forms Most Beautiful, Sean Carroll explains the role of tool kit genes in the development of organisms. Tool kit genes express products that in turn regulate whether or not other genes are turned on during embryological development. As such, most of them are transcription factors that bind to regulatory regions of a gene, regions Carroll refers to as switches. What thus determines whether or not a particular gene is expressed during development is the combination of activated and repressed switches as a consequence of the composition of the tool kit gene products available.

The teleological echo of all this can be seen from more than one angle. For example, Carroll writes:

The distribution of the genes in the tool kit tells us that the tool kit is ancient and was in place prior to the evolution of most types of animals. (p. 79)

Such observations clearly fit into the hypothesis of front-loading evolution I’ve been discussing for some time, something I hope to further explore for years to come. But there is also something more subtle.

For example, when talking about the development of the fly wing, Carroll notes:

The position of these veins and the spaces between the veins are marked out by the tool kit genes long before the veins actually form, about a week before the bug actually flies. (p. 97)

And the same theme is repeated when describing the development of the vertebrate brain:

Before these subdivisions are evident, and long before their functions are established and integrated, tool kit genes mark out the regions of the neural tube that are fated to become part of the brain. (p. 100)

Of course, this all makes sense as part of a developmental program. But what if we shift our focus just a bit and ask a simple question – are there tool kit genes for evolution itself? Evolution does not have to run as a tightly integrated program, mind you, but are there genes that mimic such developmental tool kit genes, by acting on genomes in ways that thus “mark out” trajectories that subsequent evolution can travel?

How would one go about identifying the contents of such a tool kit? We would need to assign the functional role of ‘evolution’ to specific genes. Typically, biologists classify genes according to their functional roles. But they don’t assign evolution itself as a functional role.

Perhaps this is because a non-teleological viewpoint does not see evolution as a biological function, but instead views it as an unintended side product of reproduction and other biological functions. Or perhaps it is because such functional roles are determined in the lab using genetic and biochemical assays. Such tests can help us determine, for example, whether a gene is involved in protein folding and stabilization or the transport of anions. Yet because evolution is a process that occurs over great spans of time, such assays cannot detect a bona fide ‘evolution’ gene. It would be like trying to detect the arrangement of organs in a body by using a microscope.

If evolution is a process that occurs through deep time, any evolution gene would also require a more immediate function that allowed it to be maintained generation to generation. This would mean evolution genes will be biochemically and genetically detected through their immediate functions. In other words, when Sean Carroll identifies a took kit of developmental genes, might he really be detecting a tool kit of evolution genes? Not quite, as the developmental took kit genes simply turn on switches. As evo-devo shows, it is the switches, and not the developmental tool kit genes, that evolve (the tool kit genes typically show strong conservation over deep time).

Of course, this all points us in the direction of the evolution genes.

The Evolution Gene

To sum up thus far, if we view evolution as a function, it stands to reason that life would be endowed with a tool kit of evolution genes. Such genes would interface with life’s architecture to facilitate evolution. While evolution is inevitable in a population of imperfectly replicating cells, the evolution genes would function to effectively catalyze evolution.

But what part of life’s architecture might be targeted by these evolution genes? An obvious candidate is the DNA itself, as it is the DNA that codes for the machinery of life. For example, when it comes to the evolution of body plans, evo-devo teaches us that changing the pattern of switches in front of a gene is an integral part of such evolution. The switch sets, in turn, are altered over time through the process of genetic recombination. Recombination can remove switches, add switches, or swap different versions of a switch in or out. Afterwards, natural selection behaves merely as the editor to weigh whether or not such alterations are acceptable.

The process of recombination has long been known to be very important in generating variation for evolution. As one scientist notes, “The general feeling would probably be that in some undefined way recombination allows organisms to more effectively evolve to adapt to changing environmental conditions.”

So, as our first candidate for an evolution gene, meet RecA.

RecA is a protein that is ubiquitous among bacteria. At first glance, there doesn’t seem to be anything special about it. It’s a typically sized protein of around 350 amino acids and it carries out three very basic functions: it binds to DNA, it binds to other proteins, and it binds to ATP. So why consider this a candidate for an evolution gene?

When I mentioned that genetic recombination is a powerful evolutionary force, it’s sometimes easy to forget that recombination is not an abstract process that happens all by itself. It is a process that happens because various proteins interact with the DNA to recombine it. And in this regard, RecA is the star. RecA has the ability to bind to a single strand of DNA (generated by abiotic or biotic forces) and hold on to it as it simultaneously scans another region of the double-stranded DNA in search for nucleotide sequences that are complementary. This is a process known as homology search. Once a complementary region is found, RecA then promotes the actual process of recombination known as strand-exchange (or crossing-over).

But it gets better. In addition to being the core protein responsible for recombination, RecA also interacts with another protein, LexA, to initiate the SOS response in bacteria. In fact, this process itself suggests that the ability to generate variation for evolution is built into the fabric of life, as I previously explained.

Since RecA facilitates the retooling of the DNA molecule, it is a good candidate for an evolution gene. Yet another reason for thinking it is a good candidate is its ubiquitous distribution across all life. Not only is a widely spread (and conserved) among all bacteria, but RecA versions are found in life’s other two major domains: Rad51 in eukarya and RadA in archaea. These proteins carry out the same basic functions. For example, while the process of recombination during the formation of gametes (meiosis) is more complex in eukaryotes, Rad51 carries out the core process. That “RecA” is so widely distributed suggests it could have been present in the first cells and has been facilitating evolution ever since.

RecA is truly a remarkable protein. Even though it is only about 350 amino acids in length, it carries out the multiple functions of binding multiple DNA strands, coordinating their exchange, binding ATP and hydrolyzing it, and interacting with other proteins. In fact, according to one review, the functional domains responsible for these activities closely map together and may even overlap. How is all this carried out?

I’ve left out one very important part of the story – RecA is not functional as a monomer, it only becomes functional when it forms protein fibers that wrap around the DNA.

In other words, recombination occurs because tubulin-like proteins stretch the DNA by forming a dynamically lengthening tube around it. In this way, the growing protein tube can hold onto the single stranded DNA with one “hand” while using its other “hands” to unravel double stranded DNA such that the single-stranded DNA can be used to probe the unraveled DNA for regions that are complementary.

You might have noticed I said “tubulin-like.” Is this simply because RecA forms a semi-hollow protein tube? No. There are several other features that have led one reviewer, for example, to note:

The dynamic behavior protein under conditions of ATP hydrolysis is thus conceptually similar to that of other NTP-hydrolyzing, self-assembling proteins, such as actin and tubulin.

Like tubulin, RecA formation starts slowing with a nucleation step, where a small number of monomers must form a seed that can then be extended. Once formed, like tubulin, RecA then grows at one end by the incorporation of RecA monomers bound to ATP (tubulin dimers add to one end and must be bound to GTP). Like tubulin, the NTP hydrolysis is not needed for assembly, but instead is needed for disassembly. This means that RecA, like tubulin, assembles at one end and disassembles at the other end, forming something like a treadmill. According to one team of researchers:

We argue that RecA can “proofread” the ssDNA by its own binding fluctuations. These fluctuations are similar to microtubule dynamic instability. The assembly dynamics constitute a kinetic proofreading cascade that is a “hair-trigger” sensor of DNA length. Enhancing biomolecular precision by fluctuations, which may seem somewhat counter-intuitive in a deterministic world, is presented as a natural design principle in the noisy realm of the living cell.

A microtubule-like structure is thus in charge of genetic recombination.

Finally, if RecA is an evolution gene, this would lead to an obvious prediction – removal of RecA should compromise an organism’s ability to evolve.

RecA Becomes Increasingly Important

If the recA gene is an evolution gene, we would predict that its removal would somehow negatively impact the ability to evolve. So let’s see what happens when RecA function is removed by mutation.

To start off, it does not appear RecA function is essential to unicellular life or reproduction. For example, a commonly used lab strain of E. coli is known as DH5alpha. This strain has a mutation in its recA gene such that it cannot carry out recombination. Researchers exploit this inability to make their genetic transformations more efficient. Yet these bacteria clearly survive and reproduce. The same theme holds true for yeast, a single-celled eukaryote. While yeast without Rad51 function (the eukaryotic version of recA) are quite sensitive to agents that cause DNA damage, they remain viable.

Yet if we focus on eubacteria as a group, we’d find that the distribution of recA is nearly universal. And when we compare the encoded amino acid sequence of various recA genes from distantly related species, it becomes clear this is a highly conserved gene. For example, RecA sequence from E. coli (a gm negative, enteric bacteria) shows 65% identity and 83% similarity with sequence from B. subtilis (gm positive, soil bacteria). If we couple the way RecA is dispensable for life and reproduction in single-celled organisms to the widespread distribution of the gene and strong conservation of sequence, this suggests RecA’s functions are more fully realized across generations, something we would expect from an evolution gene.

But then something changes in the multi-cellular context. Mice that have both copies of their rad51 gene removed show the embryonic lethal phenotype. In other words, such embryos die and are unable to properly develop. Thus, Rad51 function is essential in the complex process of development, at least in mammals.

Could it be that RecA’s role in evolution is to facilitate the generation of complexity? I just noted the distribution of recA is nearly universal among eubacteria. But nearly universal is not universal. So you might be asking yourself about these bacteria that lack recA. It turns out there is a theme that is shared by these few examples of recA-less bacteria – they are endosymbionts (for example, Buchnera sp.). And what characterizes endosymbiotic bacteria is extreme genome reduction and degeneration. In fact, one comparative study indicates that recA (and other DNA repair genes) are lost early in the symbiotic relationship. In other words, it is the removal of recA that might serve as the trigger for genome reduction and thus the glue that establishes the symbiotic existence.

And it is thus from this angle that the hypothesis of front-loading evolution returns, as our evolution gene could be front-loading the appearance and maintenance of multi-cellular life.

RecA Unfolds Over Time

So far, I offered some support for viewing RecA as an evolution gene: it is ubiquitous, ancient, and plays a key role in the important and evolutionarily significant process of recombination. Also, the endosymbionts suggest that it can act like a switch when it comes to genomic integrity over time.

I’ve raised this all as an alternative perspective, where “RecA’s functions are more fully realized across generations, something we would expect from an evolution gene.” In other words, it’s a question of observational scale. If, for some reason, we were restricted to making observations on the scale of milliseconds, we might be under the impression that RecA’s function is to bind ATP, because the DNA repair functions of RecA are dependent on more time and other machinery. Thus, I’m raising a perspective that expands time even further, noting that DNA repair and recombination (what we measure in the lab) is part of the evolution-function of the gene (how an observer with a larger time frame might see it).

I have noted that RecA is universal among free-living bacteria and that its amino acid sequence is strongly conserved. It’s now time to consider another aspect.

RecA exists as a single copy gene in most eubacteria. There are some exceptions where eubacteria have two copies, such as in Myxococcus xanthus, but since these cases are typically restricted to specific species, it is likely that these few exceptions represent recent gene duplications.

That most bacteria have only one copy of the RecA gene is actually quite interesting, as it means that billions upon billions of years of gene duplication have failed to expand RecA into a family of RecA proteins among bacteria. In essence, there is some kind of resistance to expansion by gene duplication. This does not mean gene duplication itself has failed to occur, as I’m sure this gene has been duplicated countless times throughout microbial evolution. But it would seem that when this occurs, it is almost always the case that the duplicate either decays away or is selected against. What’s more, this pattern is consistent with the last common ancestor of all bacteria also possessing this “resistance to expansion by gene duplication” as far as RecA is concerned, as the genome of this last common ancestor apparently had only one copy. Thus, whatever the cellular or genomic reasons for preventing RecA from blossoming into a family of RecA proteins, it has effectively defined eubacteria from the start and then billions of years afterward.

When we turn to Archaea, even though their cellular and genomic complexity is not really that different from eubacteria, it looks as if the constraint has been partially lifted. Many distantly related species have two versions of RecA, known as RadA and RadB. Although some species have lost RadB, the twin versions are distributed widely enough to indicate that the last common ancestor of Archaea possessed these two copies. The RadA and RadB proteins are also somewhat different from the RecA protein, in that they lack a C-terminal domain that is present in RecA. In addition, the RadA protein also has a N-terminal domain that is missing in RecA. Whether or not these changes helped to partially lift the constraint off expansion by gene duplication, we’re still left with no more than two copies throughout the billions of years archaea have evolved.

When we turn to Eukarya, the picture changes dramatically. Eukarya possess a RecA version that is more similar to the archaeal protein. But when we survey vertebrates and flowering plants, we find seven different versions: Rad51, Rad51B, Rad51C, Rad51D, DMC1, XRCC2, and XRCC3. These different versions carry out various specialized, but related, roles in DNA repair and recombination. What’s more, by surveying many eukaryotic genomes, these seven versions were present before the split of animals, plants, and fungi. In other words, they likely existed in a unicellular, eukaryotic state.

Thus it would appear the eukaryotic genome quickly escaped this constraint and that very early in eukaryotic evolution, both the RadA and RadB versions underwent successive rounds of gene duplication that were quickly followed by subfunctionalization. This appears to then have set the stage for the further evolution of more complex interactions that are associated with multi-cellular life. In other words, a distinct teleological perspective emerges, one where RecA had sat there for hundreds of millions of years carrying out its role in a simple genome and then when it found itself in some ancient eukaryotic context, very early on, it unfolded into several different specialized versions that would then go on to further facilitate the evolution of multi-cellular life. How lucky.

And that brings us to one of the versions known as DMC1.

Let There Be Sex

We have seen that the eukaryotic versions of RecA (Rad51, Rad51B, Rad51C, Rad51D, DMC1, XRCC2, or XRCC3) were spawned when the domain Eukarya probably existed in a unicellular state. We’ve also seen that removal of Rad51 from the mouse genome was lethal. The same holds true for Rad51B, Rad51C, Rad51D, and XRCC2 (such knock-out experiments with XRCC3 have apparently not been done yet). But removal of DMC1 is not lethal. In fact, the phenotype for such a knock-out mouse is as follows:

Homozygotes for targeted mutations are sterile with failure of homologous pairing in meiotic prophase in males and disrupted oogenesis in embryonic females with absence of germ cells in the adult ovary.

DMC1, which stands for Disruption of Meiotic Control, was originally identified in yeast in the early 1990s. Meiosis is the process by which eukaryotic diploid cells form haploid cells that in turn become gametes. In plants and animals, it is the process that generates ova and sperm/pollen. And in the single-celled yeast, this gene also plays a necessary role in facilitating recombination, guiding homologous chromosomes to cross-over during the very early stages of meiosis. Removal of DMC1 leads to arrests in the early stages of meiosis.

Thus, like Rad51, DMC1 is required for meiosis in plants, animals, and fungi. But unlike Rad51, DMC1 function is restricted to meiosis, as not only indicated by gene disruption experiments, but expression studies that find it to be synthesized only during meiosis. In essence then, DMC1 is a marker for meiosis. It is not necessary for meiosis, as fruit flies and the nematode, C. elegans, have lost their copy. But when it is present, meiosis (or the recent ability to carry out meiosis) is strongly indicated.

So that means we can reasonably estimate when meiosis originated simply by surveying the distribution of DMC1 (as least as a first step in our analysis). Thus, I took DMC1 sequence from fungi and used it to search the data bases, pulling out several examples from distantly related protozoa. For example, here it is from an amoeba. Here it is from Trypanosoma. Here it is from a ciliate (YER179W). And most interestingly, here it is from Giardia, thought to be the most “primitive” eukaryote based on the way its genes so deeply branch in the eukaryotic tree. It seems to be rather ubiquitous among the single-celled eukaryotes.

In other words, DMC1 was spawned very early during the evolution of eukaryotes and its birth may very well have coincided with the birth of Eukarya, thus defining Eukarya. Not only may the unfolding of RecA have facilitated the evolution of the complex genomes seen in metazoans, but it may likewise have spawned meiosis. The echo of the evolution genes repeats itself since it is this evolution gene that gave sex to the biotic world.

The very essence of sex is meiotic recombination – Anne Villeneuve and Ken Hillers


4 responses to “RecA – The Evolution Gene

  1. Pingback: links for 2009-03-01 « Blarney Fellow

  2. Does RecA from A microorganism such as E.coli could complement the recA mutant of B microorganism such as B.subtilis with equal function? i.e.RecA has the ability of interspecies complementation?

  3. Hi cheng,

    Yes. Proteus mirabilis is very closely related to E. coli and its RecA was able to replace B. subtilis’ RecA (also known as RecE):

    Functional substitution of the recE gene of Bacillus subtilis by the recA gene of Proteus mirabilis.
    Eitner G, Manteuffel R, Hofemeister J.
    Mol Gen Genet. 1984;195(3):516-22.

    Rec mutants of Bacillus subtilis have been tested for complementation by the recA gene of Proteus mirabilis (recApm) which was introduced into B. subtilis via the plasmid pHP334. In the recE4 mutant of B. subtilis the plasmid pHP334 restored significantly the defects in RecE functions tested: UV-sensitivity, homologous recombination (transduction and transformation) and prophage induction. Although serological methods to detect the presence of RecApm protein in B. subtilis have been unsuccessful, our results strongly indicate that the recE function of B. subtilis is analogous to the recA function of P. mirabilis.

  4. great great article. Needs more readers, like 7 billion more!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s