Digging Up More Buried Code with Ribosomal Protein S5

We have been focused on the ribosome as one plausible vehicle for front-loading evolution.  This hypothesis led us to successfully expect ribosomal proteins would carry out additional functions and that some portions of the ribosomal RNA might actually code for protein.  So it’s time to dig a little deeper.

I recently noted that the small ribosomal subunit protein, S5, also moonlights in the differentiation of red blood cells.  One thing that is striking about S5 is its highly conserved amino acid sequence.

Below is the sequence of S5 from a rat:

>gi|165970894|gb|AAI58719.1| Rps5 protein [Rattus norvegicus]
MTEWETATPAVAETPDIKLFGKWSTDDVQINDISLQDYIAVKEKYAKYLPHSAGRYAAKRFRKAQCPIVE
RLTNSMMMHGRNNGKKLMTVRIVKHAFEIIHLLTGENPLQVLVNAIINSGPREDSTRIGRAGTVRRQAVD
VSPLRRVNQAIWLLCTGAREAAFRNIKTIAECLADELINAAKGSSNSYAIKKKDELERVAKSNR

Now let’s compare it to S5 from Trichoplax adhaerens:

ref|XP_002109401.1| Gene info 40S ribosomal protein S5 [Trichoplax adhaerens]
 gb|EDV27567.1| Gene info 40S ribosomal protein S5 [Trichoplax adhaerens]
Length=203

 Score =  371 bits (953),  Expect = 2e-101, Method: Compositional matrix adjust.
 Identities = 177/204 (86%), Positives = 191/204 (93%), Gaps = 1/204 (0%)

Query  1    MTEWETATPAVAETPDIKLFGKWSTDDVQINDISLQDYIAVKEKYAKYLPHSAGRYAAKR  60
            M + E   PA AE  ++KLFGKWSTDDVQI DISL DYIAVKE++A YLPH++GRY+AKR
Sbjct  1    MVDTEIVLPA-AEPQEVKLFGKWSTDDVQIGDISLNDYIAVKERHATYLPHTSGRYSAKR  59

Query  61   FRKAQCPIVERLTNSMMMHGRNNGKKLMTVRIVKHAFEIIHLLTGENPLQVLVNAIINSG  120
            FRKAQCPIVERLTNSMMMHGRNNGKKL+ VRIVKH+FEIIHLLT ENPLQVLVNAIINSG
Sbjct  60   FRKAQCPIVERLTNSMMMHGRNNGKKLLAVRIVKHSFEIIHLLTNENPLQVLVNAIINSG  119

Query  121  PREDSTRIGRAGTVRRQAVDVSPLRRVNQAIWLLCTGAREAAFRNIKTIAECLADELINA  180
            PREDSTRIGRAGTVRRQAVDVSPLRRVNQA+WLLCTGAREAAFRN+K+I+ECLADELINA
Sbjct  120  PREDSTRIGRAGTVRRQAVDVSPLRRVNQAMWLLCTGAREAAFRNLKSISECLADELINA  179

Query  181  AKGSSNSYAIKKKDELERVAKSNR  204
            AKGSSNSYAIKKKDELERVAKSNR
Sbjct  180  AKGSSNSYAIKKKDELERVAKSNR  203

In other words, s5 from this little fella:

and this humble creature:

are 86% identical and 93% similar.

That such distantly related eukaryotes would have such a similar amino acid sequence would seem to support the hypothesis that s5 moonlights, as multiple functions might be conserving sequence at this extreme level.  After all, such great similarity in amino acid sequence is not seen when this sequence is compared to bacterial s5, yet one would think that s5 is doing the same thing in both bacterial and eukaryotic ribosomal contexts.  But that’s another rabbit trail for another day.

Right now, this high sequence similarity takes on a new meaning in the context of front-loading evolution – if this sequence is so strongly conserved, it could be used to code for other proteins through a) a shift in the reading frame and/or b) the sequence on the complementary strand of DNA.

So here is the DNA sequence for the rat ribosomal s5 gene:

5'ATGACTGAATGGGAAACAGCCACACCCGCGGTGGCAGAGACCCCGGACATCAAGCTCTTTGGGAAATGGAGCACTGATGATGTGCAGATCAACGATATTTCTCTACAGGATTACATTGCTGTGAAGGAGAAGTATGCCAAGTACCTGCCCCACAGTGCAGGACGGTATGCTGCCAAGCGTTTCCGCAAAGCACAGTGTCCCATCGTGGAGCGCCTTACTAACTCCATGATGATGCACGGTCGTAACAACGGCAAGAAGCTCATGACTGTACGAATTGTCAAGCATGCCTTTGAGATCATCCACCTGCTCACTGGTGAGAACCCTCTGCAGGTCCTGGTGAATGCTATCATCAACAGTGGCCCCCGAGAAGACTCAACACGCATTGGGCGGGCTGGAACAGTGAGACGGCAGGCTGTGGATGTATCCCCACTTCGCCGAGTGAATCAGGCCATCTGGCTGCTGTGCACGGGGGCTCGTGAGGCTGCTTTCCGGAACATCAAGACCATCGCTGAGTGCCTTGCAGATGAGCTCATTAATGCTGCCAAGGGCTCCTCCAACTCCTATGCTATCAAGAAGAAAGATGAACTGGAGCGTGTGGCCAAGTCTAACCGCTGATTTCCCAGCTGCTGCCTAATAAATTGTGTGCCCTTTGGGACAGTT3'

Using other reading frames gets tricky, for recall that the genetic code appears to have been designed to minimize the damage of frameshift mutations by ensuring stop codons are more likely to appear after a frameshift.

And sure enough, if you were to translate this sequence by shifting the reading frame one or two nucleotides, stop codons pop up everywhere:

5’3′ Frame 2
Stop L N G K Q P H P R W Q R P R T S S S L G N G A L Met Met C R S T I F L Y R I T L L Stop R R S Met P S T C P T V Q D G Met L P S V S A K H S V P S W S A L L T P Stop Stop C T V V T T A R S S Stop L Y E L S S Met P L R S S T C S L V R T L C R S W Stop Met L S S T V A P E K T Q H A L G G L E Q Stop D G R L W Met Y P H F A E Stop I R P S G C C A R G L V R L L S G T S R P S L S A L Q Met S S L Met L P R A P P T P Met L S R R K Met N W S V W P S L T A D F P A A A Stop Stop I V C P L G Q L Q K K K K K K K

5’3′ Frame 3
D Stop Met G N S H T R G G R D P G H Q A L W E Met E H Stop Stop C A D Q R Y F S T G L H C C E G E V C Q V P A P Q C R T V C C Q A F P Q S T V S H R G A P Y Stop L H D D A R S Stop Q R Q E A H D C T N C Q A C L Stop D H P P A H W Stop E P S A G P G E C Y H Q Q W P P R R L N T H W A G W N S E T A G C G C I P T S P S E S G H L A A V H G G S Stop G C F P E H Q D H R Stop V P C R Stop A H Stop C C Q G L L Q L L C Y Q E E R Stop T G A C G Q V Stop P L I S Q L L P N K L C A L W D S Y K K K K K K K K

But what if we took the DNA sequence that is complementary to the sequence of the S5 gene?  This is what we would get:

5'AACTGTCCCAAAGGGCACACAATTTATTAGGCAGCAGCTGGGAAATCAGCGGTTAGACTTGGCCACACGCTCCAGTTCATCTTTCTTCTTGATAGCATAGGAGTTGGAGGAGCCCTTGGCAGCATTAATGAGCTCATCTGCAAGGCACTCAGCGATGGTCTTGATGTTCCGGAAAGCAGCCTCACGAGCCCCCGTGCACAGCAGCCAGATGGCCTGATTCACTCGGCGAAGTGGGGATACATCCACAGCCTGCCGTCTCACTGTTCCAGCCCGCCCAATGCGTGTTGAGTCTTCTCGGGGGCCACTGTTGATGATAGCATTCACCAGGACCTGCAGAGGGTTCTCACCAGTGAGCAGGTGGATGATCTCAAAGGCATGCTTGACAATTCGTACAGTCATGAGCTTCTTGCCGTTGTTACGACCGTGCATCATCATGGAGTTAGTAAGGCGCTCCACGATGGGACACTGTGCTTTGCGGAAACGCTTGGCAGCATACCGTCCTGCACTGTGGGGCAGGTACTTGGCATACTTCTCCTTCACAGCAATGTAATCCTGTAGAGAAATATCGTTGATCTGCACATCATCAGTGCTCCATTTCCCAAAGAGCTTGATGTCCGGGGTCTCTGCCACCGCGGGTGTGGCTGTTTCCCATTCAGTCAT3'

And translated it?  We get this amino acid sequence:

N C P K G H T I Y Stop A A A G K S A V R L G H T L Q F I F L L D S I G V G G A L G S I N E L I C K A L S D G L D V P E S S L T S P R A Q Q P D G L I H S A K W G Y I H S L P S H C S S P P N A C Stop V F S G A T V D D S I H Q D L Q R V L T S E Q V D D L K G M L D N S Y S H E L L A V V T T V H H H G V S K A L H D G T L C F A E T L G S I P S C T V G Q V L G I L L L H S N V I L Stop R N I V D L H I I S A P F P K E L D V R G L C H R G C G C F P F S H

There are not as many stop codons.  In fact, if you consider the amino acids that are in bold font, that’s an unusual stretch of 174 amino acids with only one stop codon.  So let’s probe the protein databases with this  bolded sequence:

>dbj|BAH11619.1|  unnamed protein product [Homo sapiens]
Length=189

 Score =  206 bits (525),  Expect = 1e-51, Method: Compositional matrix adjust.
 Identities = 111/170 (65%), Positives = 129/170 (75%), Gaps = 5/170 (2%)

Query  4    KGHTIY-AAAGK-SAVRLGHTLQ--FIFLLDSIGVGGALGSINELICKALSDGLDVPESS  59
            KG  +Y A AGK S     H  +   + LL+ IGVGGALGSI+EL+C+ALS+GL+VPE S
Sbjct  5    KGRQVYWATAGKISGWTWPHAPRRPHVLLLNGIGVGGALGSIDELLCQALSNGLNVPEGS  64

Query  60   LTSPRAQQPDGLIHSAKWGYIHSLPSHCSSPPNAC-VFSGATVDDSIHQDLQRVLTSEQV  118
            LTS  AQQPDGLIH+AKWG++HSL SH    PNAC V  G TVDD +HQDLQRVL  EQV
Sbjct  65   LTSTCAQQPDGLIHTAKWGHVHSLSSHSPGAPNACGVLPGTTVDDGVHQDLQRVLACEQV  124

Query  119  DDLKGMLDNSYSHELLAVVTTVHHHGVSKALHDGTLCFAETLGSIPSCTV  168
             DL+GMLD+++SHELLAVV  VHHHGVSKALH+GTL F E  G IP CTV
Sbjct  125  YDLEGMLDDAHSHELLAVVAAVHHHGVSKALHNGTLSFVEAFGGIPPCTV  174


Whoa!

Let’s look more closely at the source of this protein:

FEATURES             Location/Qualifiers
     source          1..189
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /clone="BRACE3021614"
                     /tissue_type="cerebellum"
                     /clone_lib="BRACE3"
                     /note="cloning vector: pME18SFL3"
     Protein         1..189
                     /name="unnamed protein product"
     CDS             1..189
                     /coded_by="AK293872.1:122..691"

It’s a cDNA sequence derived from the human brain!

Summary:  A highly conserved ribosome protein (S5) not only appears to have a moonlighting function in the development of red blood cells, but the DNA sequence that is complementary to the S5 gene appears to code for a human brain protein.  We thus seem to have multiple layers of coding here: 1. S5 functions in protein synthesis; 2. S5 moonlights in red blood cell differentiation; and 3.  the complement of S5 gene sequence codes for a human cerebellar protein.

About these ads

One response to “Digging Up More Buried Code with Ribosomal Protein S5

  1. Mike,

    OT, Got a bit o’news. You may already be aware of it. I find it fascinating and not wholly unexpected from a Design POV.

    “From a genetic perspective, therapeutic implications aside, the observation that not all cells are the same is extremely important. That’s the bottom line,” he added. “Genome-wide association studies were introduced with enormous hype several years ago, and people expected tremendous breakthroughs. They were going to draw blood samples from thousands or hundreds of thousands of individuals, and find the genes responsible for disease.

    “Unfortunately, the reality of these studies has been very disappointing, and our discovery certainly could explain at least one of the reasons why.”

    Indeed, I’ve learned to be skeptical of scientific claims, so this did not surprise me. Especially after JunkDNA studies and 98% same as chimps mantra repeated ad nauseum.

    Although, how big a difference I’m not certain of from the study. They mention the BAK gene which seems right to me for pre-programmed cell death if thats the case.

    Link:

    Study reveals major genetic differences between blood and tissue cells

    I wonder how many other genes this could be true of down the line?

    This may yet turn out to be the greatest discovery in a long time. If heart tissue genes are different than blood cells then wow…. therapy may be missing some big areas of all types of cancer and disease. What about kidney, liver, brain tissue, etc.?

    My immediate question is…. not just about humans, but do we see it across all mamals and so on?

    Are there natural barriers that require different programming choices?

    How does this tie in with your research across multi-programming S5, if at all?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s