[I’ve combined all the previous RNAP entries together to make it easier to read. However, I did not have the time to thoroughly edit, so some parts might seem a little repetitive.]
It is well known that eukaryotic cells are more complex than prokaryotic cells. For example, while the typical eukaryotic cell is 10-100 micrometers in diameter, contains numerous membranous organelles, has an elaborate cytoskeleton, and reproduces through mitosis, the typical bacterial cell is only 0.2-2.0 micrometers in diameter, lacks organelles, and reproduces through binary fission. Clearly, the cytological complexity of the eukaryotic cell is not needed in order to be alive.
Yet the theme of needless complexity repeats itself at increasingly smaller scales like a fractal image.
Consider, for example, the three basic universal processes of information transfer: DNA replication, transcription, and translation. In both bacteria and eukaryotes, the same building blocks are used, the same macromolecules are synthesized, and the processes are essentially the same. Yet in each case, the process is more complex among eukarya than in bacteria. For example, while bacteria replicate their single chromosome from a single origin point and possess five different DNA polymerases, eukaryotes initiate replication from multiple points on their multiple chromosomes (involving a process known as licensure and contain at least 19 DNA polymerases.
If we turn to transcription, bacteria employ a small set of transcription (sigma) factors and use an RNA polymerase (RNAP) built from four subunits. Among eukaryotes, we find 100s of different transcription factors and the single RNAP has been expanded into three versions: RNAP I, RNAP II,and RNAP III. RNAP II is most similar to the bacterial version, yet if we focus just on this protein complex, we again find enhanced complexity, where the eukaryotic version contains up to 15 subunits. And when we compare the shared core subunits, the eukaryotic versions even have additional domains (Cramer, Patrick. 2002 Multisubunit RNA polymerases. Current Opinion in Structural Biology 12:89-97).
And then there is the classic example of the bacterial and eukaryotic ribosomes. As can be seen from the table below, the eukaryotic ribosome has many more proteins (for both subunits) and longer ribosomal RNAs in each subunit.
|Comparison of Ribosome Structure in Bacteria and Eukaryotes|
|Bacterial (70S)||Eukaryotic (80S)|
(1 of each)
|23S (2904 nts)||28S (4700 nts)|
|5S (120 nts)||5S (120 nts)|
|5.8S (160 nts)|
|rRNA||16S (1542 nts)||18S (1900 nts)|
Finally, if we consider the entire proteome from Eukarya, Bacteria, and Archaea, the theme of needless complexity is ubiquitous (Brocchieri, L and Karlin, S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research 33: 3390–3400). The median length of the proteins annotated among Eukaryotes is 361 amino acids while it is only 267 amino acids in Bacteria and 247 amino acids in Archaea. This is a theme that is seen among all the various functional classes of proteins, as seen from some examples in the table below.
|Median Length of Proteins (amino acids)|
|DNA replication and processing proteins||315||723|
|Cell division and chromosome partitioning||346||439|
|Inorganic ion transport and metabolism||314||538|
|Signal transduction mechanisms||323||605|
(modified from Brocchieri and Karlin)
The theme of needless complexity among eukarya is seen from many different perspectives: the global architecture of the cell, the number of steps involved in many basic processes, the number of components in any machine, and the size of proteins regardless of function. Needless complexity thus permeates the eukaryotic cell.
And this leaves us with some tantalizing questions. Why is the eukaryotic cell plan so much more complex than the bacterial cell plan? What does this increased complexity tell us about the eukaryotic cell plan relative to the bacterial version? Why does the theme of needless complexity reach into every aspect of the cell plan?
We could try to explain this by invoking the large population sizes of bacteria and hypothesize that this difference is the consequence of purifying selection. After all, it is well known that natural selection streamlines bacteria for efficient replication. Yet while this may be part of the explanation, it leaves too many stones unturned. For example, does this mean that life originated from complex, rather than simple, beginnings, and natural selection has pruned away much of this ancient complexity? And how did the eukaryotic cell plan emerge in such a way as to escape the pruning shears of purifying selection? And why hasn’t purifying selection streamlined the machinery inside the yeast cell, an organism which exists as large populations?
To show you how deep this mystery goes, let’s focus on one example of needless complexity – the RNA polymerase. As I mentioned before:
If we turn to transcription, bacteria employ a small set of transcription (sigma) factors and use an RNA polymerase (RNAP) built from four subunits. Among eukaryotes, we find 100s of different transcription factors and the single RNAP has been expanded into three versions: RNAP I, RNAP II,and RNAP III. RNAP II is most similar to the bacterial version, yet if we focus just on this protein complex, we again find enhanced complexity, where the eukaryotic version contains up to 15 subunits.
In bacteria, the RNAP is built from four different subunits, but in eukaryotes, it is built from up to 15 subunits. Yet both versions of the protein machine carry out the same essential function – linking the DNA world to the amazing world of proteins through the process of transcription (synthesizing an RNA molecule from a DNA template). But what if we also surveyed the third domain, the Archaea? Archeabacteria were long thought to be little more than exotic bacteria, given they are the same size
and possess the same level of cytoplasmic complexity as eubacteria.
Surely the streamlining hypothesis would predict that the archaeal RNAP would be similarly complex as the eubacterial RNAP.
But that is not what we see. Here is a figure that documents the RNAP subunits from the three domains, where homologs are identified by using the same color:
Goodness me. The archaeal RNAP is not only much more complex than the bacterial version, but it is very similar to the complex eukaryotic RNAP. Like I said, while streamlining may be part of the explanation, it leaves too many stones unturned. Why in the world do Archaea have such complex RNAPs? Eubacteria teach us that such complexity is not needed for the bacterial way of life. Yet there it is, as one of the defining features of the archaeal domain.
To summarize, the bacterial RNAP contains four subunits, the yeast RNAP contains 12 subunits, and the archaeal RNAP contains 11 subunits. So when it comes to complexity (the number of parts), we would group the archaeal and eukaryotic versions together.
Yet when it comes to size (HT to Guts), the archaeal and bacterial version group together, both being around 400 kD. In contrast, the yeast RNAP is around 600 kD. So we do see evidence of streamlining in archaea – it’s simply in the size and not the complexity of the RNAP. And this makes the needless complexity of the archaeal RNAP even more perplexing.
The bacterial RNAP does not contain any subunits that are specific to bacteria, meaning their four subunits are universal. In yeast, these are known as Rpb1, 2, 3, and 11 and it is known these four comprise the assembly and catalytic core of this machine.
This then means that the yeast RNAP contains 8 subunits that are not found in bacteria. And of these eight subunits, six of them (Rpb 4, 5, 7, 10, 11, and 12) are homologous to the archaeal RNAP. Do these subunits also interact in a similar fashion? Yes, as here is a schematic of their respective interaction network (where homologous subunits are coded with the same color):
Holy smokes, it’s looking more and more like the archaeal RNAP is a miniaturized version of the eukaryotic RNAP. The major difference is that the eukaryotic RNAP has two components not seen in the archaeal version – Rpb 8 and 9. But wait!
Early evolution of eukaryotic DNA-dependent RNA polymerases.
Kwapisz M, Beckouët F, Thuriaux P.
Trends Genet. 2008 May;24(5):211-5.
Eukaryotic DNA-dependent RNA polymerases (Pol I-III) share a conserved core of 12 subunits, which is closely related to archaeal RNA polymerases. Rpb8, a subunit found in Pol I, II and III, was thought to be restricted to eukaryotes. We show here that Rpb8 closely resembles an archaeal protein called G, found only in Crenarchaea, which identifies a last missing link between the core structure of archaeal and eukaryotic RNA polymerases.
So the Rpb8 has likely been lost in various lineages of archaea meaning that the archaeal RNAP contains 7/8 yeast subunits that are not seen in the bacterial RNAP.
We are still left with the following question – Why in the world do archaea have such complex RNAPs ?
The non-teleological perspective would “explain” this disparity by simply informing us that there are many ways to transcribe DNA into RNA and these two RNAPs would merely reflect the many roads to Rome. At that point, we might remind people that this non-teleological perspective led biologists astray, as it prevented them from anticipating the widespread phenomena of deep homology. But worse than that, this explanation doesn’t work with these cellular RNAPs.
Consider again the interaction maps of the archaeal RNAP (left) and eukaryal RNAP (right)
What’s not shown is the map of bacterial RNAP. But it is there in both maps. In the eukaryal map, it’s just 1, 2, 3, 6 and 11 (where 3 and 11 are fused). And biochemical evidence with all three versions of the RNAP show these subunits to form the functional core. So archaebacteria and bacteria have the same RNAP, only that the archaeal version comes with added bells and whistles.
So why does the archaeal version of the RNAP have all these needless bells and whistles? I would propose that we are witnessing yet another clue that front-loading was in play. That is, the archaeal RNAP, not needed to fill bacteria-like niches, is a preadaptation to facilitate the evolution of the eukaryotic cell plan, and thus complex multicellular life. In other words, the same hypothesis that explains the existence of protein-coding introns, and explains the odd features of the signal recognition particle, also explains the unusual aspects of the archaeal RNAP.
And this front-loading hypothesis, under the guidance of PREPA, allows us to formulate a testable hypothesis – the “bells and whistles” of the archaeal RNAP – Rbp 4, 5, 7, 10, 11, and 12 – will play crucial roles in the emergence of a) the eukaryotic cell and/or b) complex, metazoan life.
Shall we now put this hypothesis to the test?
To test this hypothesis, let’s go back to that assembly map. The first thing that stood out to me was Rbp4 and 7 (which is E and F in archaea). These two look like they interact with the core machine as a dimer (a combination of both proteins) and in cell biology, dimerization is a useful regulatory node. In other words, the bells and whistles of the archaeal RNAP might represent preadaptations that would nudge the ability to regulate the RNAP in ways that would assist the emerging complexity entailed in the appearance of eukarya, then metazoa. So the telic perspective would allow us to predict these two proteins play some form of regulatory role. With this hypothesis in hand, it was once again time to probe the literature. Over the next few days, let me share some of the things I found.
First, let’s begin with E and F in archaea. It turns out the PREPA pattern begins to take hold, as subunit F, and probably E, are not required for archaebacteria to survive:
The results reported demonstrate that the F subunit of RNAP is not essential for T. kodakarensis viability and as RNAP isolated from the T. kodakarensis DrpoF mutant also lacks subunit E, it seems possible that subunit E is also not essential for archaeal transcription in vivo. Consistent with this, the core enzyme purified from T. kodakarensis KUWLFB, which lacks both subunits E and F, was fully active in transcription initiation, elongation and termination in vitro. 
Yet E and F are not useless, as these RNAP parts are needed for the archaebacteria to grow efficiently at high temperatures. Apparently, E and F are used to activate genes needed to thrive at such temperatures, meaning they do have some form of regulatory role.
When we turn to baker’s yeast, we see the same theme:
The relative levels of Rpb4 and Rpb7 in yeasts affect the differential gene expression and stress response. Rpb4 is nonessential in S. cerevisiae and affects expression of a small number of genes under normal growth conditions. 
So Rpb4 (F in archaea) is not need to live, but instead is used to facilitate existence under stressful environmental conditions. Sounds like a regulatory node that will one day come in very handy when it is time to unfold a eukaryotic cell or metazoan body plan. In fact, now consider this:
The pol II enzyme consists of 12 subunits, which are highly conserved during evolution. The specific functions of the smaller subunits such as Rpb4 are relatively poorly understood, and they can show differences between organisms. For example, Rpb4 is essential in mammals and fission yeast (Schizosaccharomyces pombe), but not in budding yeast (Saccharomyces cerevisiae). Here we use fission yeast as a model to learn more about particular roles of Rpb4 in genome-wide transcription. 
Did you see that? Rpb4 is not essential in archaea or single-celled baker’s yeast, but is essential in multicellular fungi and animals. It looks like one of the gadgets on the needlessly complex archaeal RNAP has been recruited to play a key role in the existence of multicellularity. Can you say pre adap ta tion?
So let’s pause here and let you take this in. Next, let’s look more closely at what Rpb4 is doing.
As we are about to explore possible mechanisms by which the Rpb4 protein could nudge the evolutionary emergence of eukarya and/or metazoa, I want to set the stage by briefly reviewing two fundamental themes.
First, one of the primary ways in which eukaryotic cells differ from prokaryotic cells revolves around the nucleus.
The membrane-bound nucleus represents a very significant example of one of the pillars of PICERAS – compartmentalization. This figure should help you visualize why the nucleus is so significant:
The nuclear membrane makes it possible to sequester the DNA from the ribosomes. This means the process of transcription (RNA synthesis) is decoupled from the process of translation (protein synthesis). The decoupling opens up a huge regulatory window, as RNA can be massively processed before it is sent out of the nucleus to be translated. One example of such processing is alternative splicing, something that probably facilitated the emergence of complex metazoan life. We’ve seen that bacteria lack protein-coding introns and this is not surprising given that bacteria usually couple the process of transcription and translation, such that the begins translating the mRNA before the RNA polymerase is finished making the mRNA:
Second, in The Design Matrix, I lay out the logic of multifunctional, moonlighting proteins as mechanisms to front-load evolution. Recently, I successfully employed this logic to predict that most ribosomal proteins would have functions apart from their role in the ribosome. You can read about this here (check out section 4).
Got it? The emergence of the nucleus poses a radical change, as a process built around the coupling of transcription and translation must now be decoupled. And the front-loading hypothesis has already been successful when it comes to predicting moonlighting proteins. Let’s tie these together in the context of our needlessly complex archaeal RNA polymerase.
With the emergence of the nucleus as part of the eukaryotic cell plan comes the decoupling of transcription and translation. But something else is decoupled – the two processes that control the levels of messenger RNA in the cell. These two processes are mRNA synthesis, which occurs within the nucleus, and mRNA decay, which occurs in the cytoplasm. By physically separating the two processes, you run into a potential control problem where an increase in the synthesis of mRNA, for example, might not translate as an increase in mRNA levels because elevated mRNA decay rates in the cytoplasm might cancel out any increase in synthesis rates. And given that mRNA levels play important roles in embryological development, this could pose a serious problem for the evolutionary emergence of metazoa.
So is there a way to keep mRNA synthesis and decay rates coupled while allowing for the enhanced compartmentalization of the eukaryotic cells? Consider some recent research findings.
First of all, even though Rpb4/7 are part of the RNA polymerase, whose job is to synthesize RNA in the nucleus, it was determined that Rpb4/7 shuttle back and forth between the nucleus. Why in the world is a RNAP component spending time in the cytoplasm?
Secondly, it was determined that Rpb4/7 can bind the mRNA. However, it can only bind the mRNA if it is first part of the RNAP. Thus, the mRNA leaves the nucleus with Rpb4/7 stuck to it.
Third, out in the cytoplasm, Rpb4/7 moonlights and attract components of the RNA decay machinery.
In other words, the Rpb4/7 components that are not necessary for archaebacterial survival appear to play a key role in linking the processes of transcription and RNA decay in a eukaryotic context. For it is the location of Rpb4/7, whether in the nucleus or outside the nucleus, that determines their function. As researchers note:
This study provides an example for “conditional interaction” between two interacting partners that occurs only within specific molecular context. Such kind of interaction might be the basis for other cases of coupling between two processes. Summarily, conditional interaction between Rpb4/7 and mRNAs allows Pol II to impact not only transcription but also the fate of its products after they left the nucleus. This is the first indication that Pol II can affect mRNA decay in the cytoplasm and the first evidence for a direct mechanistic coupling between transcription in the nucleus and the two major mRNA decay processes in the cytoplasm. (emphasis added). 
And as I wrote in The Design Matrix:
The multifunctional nature of many proteins can be unlocked across time. A designer could implement one of the protein’s functions in service of unicellular life, while the other functions remain “in-waiting” for the appearance of the proper context for their expression. We know, for example, the multiple functions of various proteins are often unleashed as a function of location. A different function can be unlocked by localizing the protein in a different place in the cell, by localizing the protein outside of the cell, or by localizing the protein in a different type of cell.
1. Transcription in the nucleus and mRNA decay in the cytoplasm are coupled processes Vicky Goler-Baron, Michael Selitrennik, Oren Barkai, et al. Genes Dev. 2008 22: 2022-2027
As we have seen, the bacterial and archaeal RNA polyermase (RNAP) differ in complexity. Despite the fact that the cell plan of both life forms is small, relatively simple, and streamlined, the RNAPs differ remarkably in terms of complexity, where the bacterial version is built from four parts, while the archaeal version is built from 11 parts. The archaeal version has homologs of the four bacterial components needed to carry out the core process of transcription, meaning the remaining parts are “bells and whistles”
As far as I have been able to determine, no one has thought to ask why the archaeal RNAP is so much more needlessly complex than the bacterial version. I would expect the non-teleological perspective would “explain” this disparity by insisting that there are many ways to transcribe DNA into RNA and these two RNAPs would merely reflect the many roads to Rome. But that is not a very satisfying speculation. So let me be the first to ask the question and the first to propose an answer.
From the hypothesis of front-loading, allow me to formulate a testable hypothesis – the “bells and whistles” of the archaeal RNAP – Rbp 4, 5, 7, 10, 11, and 12 – will play crucial roles in the emergence of a) the eukaryotic cell and/or b) complex, metazoan life
If we begin our analysis by focusing on Rbp4 and 7, which function together as a dimer, we have already seen some clues to support this hypothesis. First, Rnp4 and probably 7 are not needed in order for archaebacteria or single-celled yeast cells to survive, but are essential for the survival of multicellular fungi. Second, Rnp4/7 appear to be preadapted to facilitate the emergence of the complex eukaryotic cell plan given they not only function in transcription, but also moonlight to control RNA decay outside of the nucleus. Let’s now add some more clues.
It turns out that Rbp4 is essential for embryological development in flies. A recent experiment determined that RNA levels of Rbp4 were high early in development and then decreased later in development. And when the Rpb4 gene is removed by genetic manipulation, death occurs during the early stages of larval development . So while Rpb4 is not needed for archaebacterial or single-celled eukaryotic survival, it is needed for embryological development.
If we go back to baker’s yeast, Rpb4 and 7 can function as a switch to help cells “decide” how to proceed under the stress of starvation. The cell essentially has two choices – to activate a program that will generate spores (go dormant and wait for things to improve)or to activate a program that will generate pseudohyphae which, as you can see from “b” in the figure below, are multicellular filaments/”feelers” that spread out to improve the odds of nutrient retrieval.
According to the study that documented this switch behavior:
The Rpb4/7 subcomplex of RNA polymerase II in Saccharomyces cerevisiae is known to play an important role in stress response and stress survival. These two proteins perform overlapping functions ensuring an appropriate cellular response through transcriptional regulation of gene expression. Rpb4 and Rpb7 also perform many cellular functions either together or independent of one another. Here, we show that Rpb4 and Rpb7 differently affect during the nutritional starvation response pathways of sporulation and pseudohyphae formation. Rpb4 enhances the cells’ proficiency to sporulate but suppresses pseudohyphal growth. On the other hand, Rpb7 promotes pseudohyphal growth and suppresses sporulation in a dose-dependent manner. We present a model whereby the stoichiometry of Rpb4 and Rpb7 and their relative levels in the cell play a switch like role in establishing either sporulation or pseudohyphal gene expression. 
So two of the archaeal ‘bells and whistles’ are behaving as a switch, where one option is to proceed along lines of developing a simple multicellular state. But it gets better.
Another study  looked at fission yeast:
And found that Rpb4 controls the expression levels of genes involved in cell separation after mitosis:
To learn more about the roles of Rpb4, we expressed the rpb4 gene under the control of regulatable promoters of different strength in fission yeast. We demonstrate that below a critical level of transcription, Rpb4 affects cellular growth proportional to its expression levels: cells expressing lower levels of rpb4 grew slower compared to cells expressing higher levels. Lowered rpb4 expression did not affect cell survival under several stress conditions, but it caused specific defects in cell separation similar to sep mutants. Microarray analysis revealed that lowered rpb4 expression causes a global reduction in gene expression, but the transcript levels of a distinct subset of genes were particularly responsive to changes in rpb4 expression.
They also noted:
Fission yeast cells grown under low expression of rpb4 were elongated and showed defects in cell separation as indicated by the accumulation of division septa, some of them highly aberrant (Fig. 3).
And interpret this as follows:
in abundant nutrients, global transcription is efficient and growth is best as single cells, while in limiting nutrients, global transcription and cell separation are compromised, and cells grow as multicellular pseudohyphae, which may allow a more efficient grazing for new nutrients as growth is directed.
So what began as a needless component of the archaeal RNAP has taken on a role involved in the generation of multicellular structures. In fact, with the failure to complete cell separation, the whole system seems poised to generate true hyphae or even a syncytium (which plays a key role in early development in flies). As far as Rpb4 and 7 go, the data do indeed support my front-loading hypothesis. These two components can be viewed as part of a choice architecture that would eventually nudge a multicellular state into existence.
1. Pankotai T, Ujfaludi Z, Vámos E, Suri K, Boros IM. 2009. The dissociable RPB4 subunit of RNA Pol II has vital functions in Drosophila. Mol Genet Genomics. 283(1):89-97.
2. Singh SR, Pillai B, Balakrishnan B, Naorem A, Sadhale PP. 2007.
Relative levels of RNA polII subunits differentially affect starvation response in budding yeast. Biochem Biophys Res Commun. 356(1):266-72.
3. Sharma N, Marguerat S, Mehta S, Watt S, Bähler J. 2006. The fission yeast Rpb4 subunit of RNA polymerase II plays a specialized role in cell separation. Mol Genet Genomics. 276(6):545-54.