Buried Code

We have already seen that most of the universal small subunit ribosomal proteins have alternative functions. If ribosomal proteins can be used as a vehicle for front-loading, given that a designer can count on the ribosome being perpetuated far into the future with minimal changes, why not also use the ribosomal RNA (rRNA) itself?

rRNA forms the functional part of the ribosome where, with the help of the ribosomal proteins, it folds into a complex 3D structure that interacts with the messenger RNA (mRNA) and transfer RNAs to carry out the core processes of protein synthesis. While rRNA, which is synthesized by RNA polymerase 1 is typically the end-product, natural genetic engineering processes could copy and transplant rRNA sequence so that it was under the control of an RNA polymerase II promoter. This would mean that the rRNA sequence would suddenly find itself being transcribed as mRNA and thus translated into a protein.

A clever front-loader might encode proteins-for-the-future in the rRNA sequence itself. In other words, while rRNA sequence is not normally used to code for proteins, it could be used to store code for some proteins. Of course, the coding potential is limited, as rRNA sequence plays a crucial, conserved role in the process of protein synthesis. The ability to code amino acid sequence would thus be limited by the sequence needed for the rRNA to carry out its function. Nevertheless, the opportunity for some degree of front-loading exists.

With this in mind, I decided to take a rather unique approach and search for protein sequence encoded in rRNA.Let’s begin with the following sequence:

ACTAGTTACGCGACCCCCGAGCGGTCGGCGTCCCCCAACTTCTTAGAGG GACAAGTGGCGTTCAGCCACCCGAGATTGAGCAATAACAGGTCTGTGAT GCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGACTGGCTCAGCGT GTGCCTACCCTGCGCCGGCAGGCGCGGGTAACCCGTTGAACCCCATTCGT GATGGGGATCGGGGATTGCAATTATTCCCCATGAACGAGGAATTCCCAGT AAGTGCGGGTCATAAGCTTGCGTTGATTAAGTCCCTGCCCTTTGTACACA CCGCCCGTCGCTACTACCGATTGGATGGTTTAGTGAGGCCCTCGGATCGG CCCCGCCGGGG

This sequence is from mouse 18S rRNA, the core RNA component of the small ribosomal subunit that binds all the moonlighting ribosomal proteins we have discussed earlier. This sequence corresponds to positions 1402-1759 of the 18S rRNA and shows > 80% sequence identity with various sponges and protozoa.

What happens if we were to translate this sequence (recall, rRNA sequence is not normally translated) and use that amino acid sequence to probe a database of proteins? Well, we find this:

>dbj|BAE89989.1|  unnamed protein product [Macaca fascicularis]
Length=130

 Score =  208 bits (529),  Expect = 2e-52
 Identities = 118/119 (99%), Positives = 118/119 (99%), Gaps = 0/119 (0%)
 Frame = +1

Query  1    TSYATPERSASPNFLEGQVAFSHPRLSNNRSVMPLDVRGCTRATLTgsacaypapagagn  180
            TSYATPERSASPNFLEGQVAFSHPRLSNNRSVMPLDVRGCTRATLTGSACAYP PAGAGN
Sbjct  3    TSYATPERSASPNFLEGQVAFSHPRLSNNRSVMPLDVRGCTRATLTGSACAYPTPAGAGN  62

Query  181  pLNPIRDGDRGLQLFPMNEEFPVSAGHKLALIKSLPFVHTARRYYRLDGLVRPSDRPRR  357
            PLNPIRDGDRGLQLFPMNEEFPVSAGHKLALIKSLPFVHTARRYYRLDGLVRPSDRPRR
Sbjct  63   PLNPIRDGDRGLQLFPMNEEFPVSAGHKLALIKSLPFVHTARRYYRLDGLVRPSDRPRR  121

Whoa! A chunk of rRNA sequence from mouse 18S rRNA appears to code for an unnamed protein product in this little fella:

But it is not just this monkey. This same rRNA sequence also encodes a “conserved hypothetical protein” from a wide range of eukaryotic organisms (with E values less than -10), including:

Rattus

Gallus

Danio

Diaphorina

Aspergillus

Branchiostoma

Homo sapiens

Schistosoma

Talaromyces

Nematostella

Perkinsus marinus

Monodelphis domestica

So let’s look more closely at the Macaca protein. The amino acid sequence comes from a translated cDNA (cDNA is DNA sequence derived from protein-coding mRNA sequence). Here is how this cDNA is described:

GenBank: AB172927.1

Macaca fascicularis brain cDNA clone: QflA-20247, similar to human stathmin-like 2 (STMN2), mRNA, RefSeq: NM_007029.2
gi|90081017|dbj|AB172927.1|[90081017]

Similar to STMN2?! Here’s the expression profile of STMN2:

If you squint hard enough at the x-axis, you’ll see that STMN2 is expressed only in the nervous system.

In fact, this page describes STMN2 as follows:

Recommended name:
Stathmin-2
Alternative name(s):
Superior cervical ganglion-10 protein
Short name=Protein SCG10

May play a role in neuronal differentiation, and in modulating membrane interaction with the cytoskeleton during neurite outgrowth.

STMN2 is one version of stathmin protein, which is described as follows:

Involved in the regulation of the microtubule (MT) filament system by destabilizing microtubules. Prevents assembly and promotes disassembly of microtubules. Phosphorylation at Ser-16 may be required for axon formation during neurogenesis. Involved in the control of the learned and innate fear.

So let’s use a program known as CLUSTALW to align STMN2, the unnamed protein from Macaca, and stathmin from Macaca:

Stathmin_[Macaca_fasciculari      MASS----------------------------------DIQVKELEKRAS
STMN2_[Homo_sapiens]              MAKTAMAYKEKMKELSMLSLICSCFYPEPRNINIYTYDDMEVKQINKRAS
unnamed_protein[Macaca_fasci      MLTS-------------------------------------YATPERSAS
                                  * .:                                         :: **

Stathmin_[Macaca_fasciculari      GQAFELILSPRSKESVPEFPLSPPKKKDLSLEEIQKKLEAAEERRKSHEA
STMN2_[Homo_sapiens]              GQAFELILKPPSPISEAPRTLASPKKKDLSLEEIQKKLEAAEERRKSQEA
unnamed_protein[Macaca_fasci      PNFLEGQVAFSHPRLSNNRSVMP-----LDVRGCTRATLTGSACAYPTPA
                                   : :*  :           .: .     *.:.   :   :..    .  *

Stathmin_[Macaca_fasciculari      EVLKQLAEKREHEKEVLQKAIEENNNFSKMAEEKLTHKMEANKENREAQM
STMN2_[Homo_sapiens]              QVLKQLAEKREHEREVLQKALEENNNFSKMAEEKLILKMEQIKENREANL
unnamed_protein[Macaca_fasci      GAGNPLNPIRDGDRGLQLFPMNEEFPVSAGHKLALIKSLPFVHTAR----
                                   . : *   *: :: :   .::*:  .*   :  *  .:   :  *    

Stathmin_[Macaca_fasciculari      AAKLERLREKDKHIEEVRKNKESKDPADETEAD
STMN2_[Homo_sapiens]              AAIIERLQEKERHAAEVRRNKELQVELSG----
unnamed_protein[Macaca_fasci      --RYYRLDGLVRPSDRPRRGRPTALAER-----
                                       **    :   . *:.:            

The positions marked by * have the same amino acid, the positions marked by : have very similar amino acids, and the positions marked by . have somewhat similar amino acids.

Since roughly 30% of the positions appear to contain the same or highly similar amino acids, it is plausible that the unnamed Macaca protein is homologous to stathmin. And if this is the case, a brain protein is effectively encoded by a portion of the 18S rRNA sequence.

Summary

The hypothesis of front-loading evolution allowed me to hypothesize that rRNA sequence may actually contain code for the formation of proteins. When I translated a portion of 18S rRNA mouse sequence, and used that translated sequence to probe protein databases, an unnamed protein from Macaca, along with a conserved hypothetical protein from various distantly related eukaryotes, was retrieved. This protein might be homologous to stathmin, a protein that regulates microtubule assembly in neurons.

Advertisements

6 responses to “Buried Code

  1. Very nice. Would this be considered an example of moonlighting?

  2. evolvingideas

    Very cool analysis!
    I have two questions:
    1. How did you choose the sequence from the 18S RNA (1402-1759)? Did you try any other sequences?
    2. Is a 30% similarity in nucleic acid sequence really high enough to predict protein homology? I ask as an outsider to the field. It sounds pretty low, especially if you don’t try to predict secondary or tertiary structures.

  3. Hi evolvingideas,

    1. I systematically used chunks of the entire 18S rRNA. There are other interesting hits I’ll write about.

    2. Yeah, I’m not confident of any homology, which is why I said they “might” be homologous. However, even if they are not, it is interesting that some protein(s) out there seem to be encoded by rRNA sequence. It would be even more interesting to determine what they are doing.

  4. evolvingideas

    I look forward to reading about your other findings!
    You might be able to actually test your hypothesis about front-loading. It appears from this paper that certain regions evolve faster in the rRNA. If you find the protein coding sequences mainly in the conserved part, that would support your hypothesis. But of course, you would have to do it rigorously and finish the search for homologies before looking where they are.

  5. Thanks for that paper, as it might indeed be helpful. When I get some ink, I’ll print it out and read it. I did quasi-test this hypothesis somewhat by checking to see how conserved this region was. I’m encouraged by the fact that it is so large and that the sequence conservation was >80%.

    As for homologies, I should point out that I am no expert at detecting homology. A big problem with the hypothesis of stathmin homology is that blastp with the Macaca sequence does not retrieve stathmins.

    In an ideal world, where I had the time, the money, and a little help, I would take the Macaca gene and express it in E. coli, purify the protein, raise antibodies against it, and then use the antibodies to probe Macaca brain tissue.

  6. evolvingideas

    Yes…
    That would certainly prove your point.
    I wonder how long it will take before somebody else gets interested in the same question and actually does the labwork.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s