Let’s take a few moments to show you that the green algae, Volvox carteri, contains sequence that apparently codes for a homolog of the human beta-catenin protein.
If you take the sequence from the human beta-catenin gene (CAA61107.1) and use it to BLAST the genome of Volvox carteri, you will retrieve “hypothetical protein VOLCADRAFT_41528 [Volvox carteri f. nagariensis]” (XP_002955847.1)
Now, human beta-catenin is 781 amino acids in length while the Volvox protein is 525 amino acids. The BLAST program is able to align the sequences such that sequence of the entire Volvox protein is matched up against human beta-catenin starting around amino acid position 150. When this is done, 144/536 (27%) of the positions are identical and 238/536 (45%) positions contain amino acids that have similar properties. Given the phylogentic distance between these two species, that is pretty impressive. Could it be simple coincidence that these positions match up like this? No. The E value associated with this match is 1e-25. The BLAST program is designed such that matches with E values less that 1e-04 are not attributed to chance. This is why biologists infer homology when the E value is that small. And given that 1e-25 is smaller than 1e-04 by several orders of magnitude, we can safely assume these two sequences are homologous.
But it actually gets better than this.
Proteins are modular structures composed of domains, which are relatively short spans of amino acids typically associated with a function. Beta-catenin is composed of a series of such spans known as armadillo (Arm) repeats located in the middle bulk of the protein. The Arm is approximately 40 amino acids in length and consists of three alpha-helices where the second and third alpha helix pack against each other. In fact,
The 3-dimensional fold of an armadillo repeat is known from the crystal structure of beta-catenin, where the 12 repeats form a superhelix of alpha helices with three helices per unit (PUBMED:9298899). The cylindrical structure features a positively charged grove, which presumably interacts with the acidic surfaces of the known interaction partners of beta-catenin.
The bottom line is that beta-catenin contains 12 Arm repeats in series such that the protein has the overall shape of a super-helix (reminiscent of DNA structure) as can be seen in the bottom figure below:
We’ll come back to this structure in the next posting. For now, another way to visualize this is using NCBI’s Conserved Domain Database (CDD), which is “a collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. It also includes alignments of the domains to known 3-dimensional protein structures in the MMDB database.”
Below is the CDD representation of beta-catenin:
You can see the “12 repeats form a superhelix of alpha helices with three helices per unit” represented as four blocks labeled in red as ARM (the first block is more Arm-like, probably because of sequence divergence). I’m guessing that the little mountain-triangles under the number line (that represents the amino acids) correspond to the actual Arm repeats, as there are 12 sets (3 per box/unit).
Pay attention to this pattern. Why? Let’s now consider the CDD representation of the homolog from Volvox:
My goodness, they look almost the same! Like beta-catenin, the Volvox protein is a series of Arm repeats. Instead of twelve, there appear to be ten, but they seem to be broken into four units that are laid out almost identically to the human beta-catenin, even to the point of there being an overlap between units 3 and 4.
So we are on solid ground when inferring the human and green algae proteins are homologous:
1. The entire length of the Volvox protein aligns with human beta catenin with an E-value of 1e-25.
2. Both proteins are a series of Arm repeats that show a very similar CDD representation, indicating that these proteins have similar structures.
Having placed the homologous relationship on firm ground, let’s now have some fun.