Let me explain some more about the human brain protein that is encoded by the ribosomal S5 protein.
As most of you probably know, DNA is a double-stranded chain of nucleotides. When it is time to express a gene, the RNA polymerase unwinds the DNA and uses only one of the strands to make a copy in the RNA format as shown in the figure below:
What I showed with S5 is that while the bottom strand codes for this highly conserved ribosomal protein, the upper complementary strand, if it were to be transcribed and translated, would code for an unnamed protein product isolated from the human brain.
That is, if we translate this upper strand DNA sequence into amino acid sequence, and use that amino acid sequence to probe databases, we find this match:
>dbj|BAH11619.1| unnamed protein product [Homo sapiens]
Score = 206 bits (525), Expect = 1e-51, Method: Compositional matrix adjust. Identities = 111/170 (65%), Positives = 129/170 (75%), Gaps = 5/170 (2%) Query 4 KGHTIY-AAAGK-SAVRLGHTLQ--FIFLLDSIGVGGALGSINELICKALSDGLDVPESS 59 KG +Y A AGK S H + + LL+ IGVGGALGSI+EL+C+ALS+GL+VPE S Sbjct 5 KGRQVYWATAGKISGWTWPHAPRRPHVLLLNGIGVGGALGSIDELLCQALSNGLNVPEGS 64 Query 60 LTSPRAQQPDGLIHSAKWGYIHSLPSHCSSPPNAC-VFSGATVDDSIHQDLQRVLTSEQV 118 LTS AQQPDGLIH+AKWG++HSL SH PNAC V G TVDD +HQDLQRVL EQV Sbjct 65 LTSTCAQQPDGLIHTAKWGHVHSLSSHSPGAPNACGVLPGTTVDDGVHQDLQRVLACEQV 124 Query 119 DDLKGMLDNSYSHELLAVVTTVHHHGVSKALHDGTLCFAETLGSIPSCTV 168 DL+GMLD+++SHELLAVV VHHHGVSKALH+GTL F E G IP CTV Sbjct 125 YDLEGMLDDAHSHELLAVVAAVHHHGVSKALHNGTLSFVEAFGGIPPCTV 174
Yet as I mentioned in the previous post, there is a single stop codon in the middle of the S5 complementary sequence. Where is it?
If you look on the line marked ‘Query 60’, you see the following sequence from the S5 complementary strand: PNAC-VFSG. The dash represents that stop codon. Notice now that when this sequence is aligned with the brain protein sequence, the matching brain sequence is PNACGVLPG. The stop codon from the S5 complementary sequence lines up a glycine (G) residue in the brain protein.
So what stop codon is used? As you can see from the below figure that lines up the DNA sequence with the encoded amino acids, the stop codon (in bold) is tga (or uga).
aactgtcccaaagggcacacaatttattaggcagcagctgggaaatcagcggttagactt N C P K G H T I Y - A A A G K S A V R L ggccacacgctccagttcatctttcttcttgatagcataggagttggaggagcccttggc G H T L Q F I F L L D S I G V G G A L G agcattaatgagctcatctgcaaggcactcagcgatggtcttgatgttccggaaagcagc S I N E L I C K A L S D G L D V P E S S ctcacgagcccccgtgcacagcagccagatggcctgattcactcggcgaagtggggatac L T S P R A Q Q P D G L I H S A K W G Y atccacagcctgccgtctcactgttccagcccgcccaatgcgtgttgagtcttctcgggg I H S L P S H C S S P P N A C - V F S G gccactgttgatgatagcattcaccaggacctgcagagggttctcaccagtgagcaggtg A T V D D S I H Q D L Q R V L T S E Q V gatgatctcaaaggcatgcttgacaattcgtacagtcatgagcttcttgccgttgttacg D D L K G M L D N S Y S H E L L A V V T accgtgcatcatcatggagttagtaaggcgctccacgatgggacactgtgctttgcggaa T V H H H G V S K A L H D G T L C F A E acgcttggcagcataccgtcctgcactgtggggcaggtacttggcatacttctccttcac T L G S I P S C T V G Q V L G I L L L H agcaatgtaatcctgtagagaaatatcgttgatctgcacatcatcagtgctccatttccc S N V I L - R N I V D L H I I S A P F P aaagagcttgatgtccggggtctctgccaccgcgggtgtggctgtttcccattcagtcat K E L D V R G L C H R G C G C F P F S H
So let’s look at the genetic code:
As you can see, we can easily convert the stop codon into a codon for glycine with a single mutation: TGA to GGA.
This type of subtle analysis increases our confidence that this brain protein is derived from the ribosomal S5 protein gene.