We have been focused on the ribosome as one plausible vehicle for front-loading evolution. This hypothesis led us to successfully expect ribosomal proteins would carry out additional functions and that some portions of the ribosomal RNA might actually code for protein. So it’s time to dig a little deeper.
I recently noted that the small ribosomal subunit protein, S5, also moonlights in the differentiation of red blood cells. One thing that is striking about S5 is its highly conserved amino acid sequence.
Below is the sequence of S5 from a rat:
>gi|165970894|gb|AAI58719.1| Rps5 protein [Rattus norvegicus] MTEWETATPAVAETPDIKLFGKWSTDDVQINDISLQDYIAVKEKYAKYLPHSAGRYAAKRFRKAQCPIVE RLTNSMMMHGRNNGKKLMTVRIVKHAFEIIHLLTGENPLQVLVNAIINSGPREDSTRIGRAGTVRRQAVD VSPLRRVNQAIWLLCTGAREAAFRNIKTIAECLADELINAAKGSSNSYAIKKKDELERVAKSNR
Now let’s compare it to S5 from Trichoplax adhaerens:
ref|XP_002109401.1| 40S ribosomal protein S5 [Trichoplax adhaerens] gb|EDV27567.1| 40S ribosomal protein S5 [Trichoplax adhaerens] Length=203 Score = 371 bits (953), Expect = 2e-101, Method: Compositional matrix adjust. Identities = 177/204 (86%), Positives = 191/204 (93%), Gaps = 1/204 (0%) Query 1 MTEWETATPAVAETPDIKLFGKWSTDDVQINDISLQDYIAVKEKYAKYLPHSAGRYAAKR 60 M + E PA AE ++KLFGKWSTDDVQI DISL DYIAVKE++A YLPH++GRY+AKR Sbjct 1 MVDTEIVLPA-AEPQEVKLFGKWSTDDVQIGDISLNDYIAVKERHATYLPHTSGRYSAKR 59 Query 61 FRKAQCPIVERLTNSMMMHGRNNGKKLMTVRIVKHAFEIIHLLTGENPLQVLVNAIINSG 120 FRKAQCPIVERLTNSMMMHGRNNGKKL+ VRIVKH+FEIIHLLT ENPLQVLVNAIINSG Sbjct 60 FRKAQCPIVERLTNSMMMHGRNNGKKLLAVRIVKHSFEIIHLLTNENPLQVLVNAIINSG 119 Query 121 PREDSTRIGRAGTVRRQAVDVSPLRRVNQAIWLLCTGAREAAFRNIKTIAECLADELINA 180 PREDSTRIGRAGTVRRQAVDVSPLRRVNQA+WLLCTGAREAAFRN+K+I+ECLADELINA Sbjct 120 PREDSTRIGRAGTVRRQAVDVSPLRRVNQAMWLLCTGAREAAFRNLKSISECLADELINA 179 Query 181 AKGSSNSYAIKKKDELERVAKSNR 204 AKGSSNSYAIKKKDELERVAKSNR Sbjct 180 AKGSSNSYAIKKKDELERVAKSNR 203
In other words, s5 from this little fella:
and this humble creature:
are 86% identical and 93% similar.
That such distantly related eukaryotes would have such a similar amino acid sequence would seem to support the hypothesis that s5 moonlights, as multiple functions might be conserving sequence at this extreme level. After all, such great similarity in amino acid sequence is not seen when this sequence is compared to bacterial s5, yet one would think that s5 is doing the same thing in both bacterial and eukaryotic ribosomal contexts. But that’s another rabbit trail for another day.
Right now, this high sequence similarity takes on a new meaning in the context of front-loading evolution – if this sequence is so strongly conserved, it could be used to code for other proteins through a) a shift in the reading frame and/or b) the sequence on the complementary strand of DNA.
So here is the DNA sequence for the rat ribosomal s5 gene:
Using other reading frames gets tricky, for recall that the genetic code appears to have been designed to minimize the damage of frameshift mutations by ensuring stop codons are more likely to appear after a frameshift.
And sure enough, if you were to translate this sequence by shifting the reading frame one or two nucleotides, stop codons pop up everywhere:
5’3′ Frame 2
Stop L N G K Q P H P R W Q R P R T S S S L G N G A L Met Met C R S T I F L Y R I T L L Stop R R S Met P S T C P T V Q D G Met L P S V S A K H S V P S W S A L L T P Stop Stop C T V V T T A R S S Stop L Y E L S S Met P L R S S T C S L V R T L C R S W Stop Met L S S T V A P E K T Q H A L G G L E Q Stop D G R L W Met Y P H F A E Stop I R P S G C C A R G L V R L L S G T S R P S L S A L Q Met S S L Met L P R A P P T P Met L S R R K Met N W S V W P S L T A D F P A A A Stop Stop I V C P L G Q L Q K K K K K K K
5’3′ Frame 3
D Stop Met G N S H T R G G R D P G H Q A L W E Met E H Stop Stop C A D Q R Y F S T G L H C C E G E V C Q V P A P Q C R T V C C Q A F P Q S T V S H R G A P Y Stop L H D D A R S Stop Q R Q E A H D C T N C Q A C L Stop D H P P A H W Stop E P S A G P G E C Y H Q Q W P P R R L N T H W A G W N S E T A G C G C I P T S P S E S G H L A A V H G G S Stop G C F P E H Q D H R Stop V P C R Stop A H Stop C C Q G L L Q L L C Y Q E E R Stop T G A C G Q V Stop P L I S Q L L P N K L C A L W D S Y K K K K K K K K
But what if we took the DNA sequence that is complementary to the sequence of the S5 gene? This is what we would get:
And translated it? We get this amino acid sequence:
N C P K G H T I Y Stop A A A G K S A V R L G H T L Q F I F L L D S I G V G G A L G S I N E L I C K A L S D G L D V P E S S L T S P R A Q Q P D G L I H S A K W G Y I H S L P S H C S S P P N A C Stop V F S G A T V D D S I H Q D L Q R V L T S E Q V D D L K G M L D N S Y S H E L L A V V T T V H H H G V S K A L H D G T L C F A E T L G S I P S C T V G Q V L G I L L L H S N V I L Stop R N I V D L H I I S A P F P K E L D V R G L C H R G C G C F P F S H
There are not as many stop codons. In fact, if you consider the amino acids that are in bold font, that’s an unusual stretch of 174 amino acids with only one stop codon. So let’s probe the protein databases with this bolded sequence:
>dbj|BAH11619.1| unnamed protein product [Homo sapiens] Length=189 Score = 206 bits (525), Expect = 1e-51, Method: Compositional matrix adjust. Identities = 111/170 (65%), Positives = 129/170 (75%), Gaps = 5/170 (2%) Query 4 KGHTIY-AAAGK-SAVRLGHTLQ--FIFLLDSIGVGGALGSINELICKALSDGLDVPESS 59 KG +Y A AGK S H + + LL+ IGVGGALGSI+EL+C+ALS+GL+VPE S Sbjct 5 KGRQVYWATAGKISGWTWPHAPRRPHVLLLNGIGVGGALGSIDELLCQALSNGLNVPEGS 64 Query 60 LTSPRAQQPDGLIHSAKWGYIHSLPSHCSSPPNAC-VFSGATVDDSIHQDLQRVLTSEQV 118 LTS AQQPDGLIH+AKWG++HSL SH PNAC V G TVDD +HQDLQRVL EQV Sbjct 65 LTSTCAQQPDGLIHTAKWGHVHSLSSHSPGAPNACGVLPGTTVDDGVHQDLQRVLACEQV 124 Query 119 DDLKGMLDNSYSHELLAVVTTVHHHGVSKALHDGTLCFAETLGSIPSCTV 168 DL+GMLD+++SHELLAVV VHHHGVSKALH+GTL F E G IP CTV Sbjct 125 YDLEGMLDDAHSHELLAVVAAVHHHGVSKALHNGTLSFVEAFGGIPPCTV 174
Let’s look more closely at the source of this protein:
FEATURES Location/Qualifiers source 1..189 /organism="Homo sapiens" /db_xref="taxon:9606" /clone="BRACE3021614" /tissue_type="cerebellum" /clone_lib="BRACE3" /note="cloning vector: pME18SFL3" Protein 1..189 /name="unnamed protein product" CDS 1..189 /coded_by="AK293872.1:122..691"
It’s a cDNA sequence derived from the human brain!
Summary: A highly conserved ribosome protein (S5) not only appears to have a moonlighting function in the development of red blood cells, but the DNA sequence that is complementary to the S5 gene appears to code for a human brain protein. We thus seem to have multiple layers of coding here: 1. S5 functions in protein synthesis; 2. S5 moonlights in red blood cell differentiation; and 3. the complement of S5 gene sequence codes for a human cerebellar protein.