Signature in the Pseudogenes, Part 1
Today's entry was written by Dennis Venema and Darrel Falk. Please note the views expressed here are those of the author, not necessarily of The BioLogos Foundation. You can read more about what we believe here.
In our previous post, we likened comparing genomes of related organisms to reading alternative history novels. We noted that before two species diverge, they share the same “backstory” but then go on to accumulate changes after separation.
One interesting feature of looking at genomes is that often we can find the mutated remains of once-functional genes. These are called pseudogenes, or “false genes.” Pseudogenes might be part of a shared backstory for two species, or they might crop up independently after two species go their separate ways. Either way, they are easy to spot at the molecular level because they retain a lot of similarity. For example, here are the DNA sequences for the start of one particular gene1 in several species (for our purposes, its function is not important).
As you examine the sequence of letters above, note that DNA contains a four letter code. This string of “letters” is made up of the molecules adenine, guanine, cytosine, and thymine strung together within the large super-molecule, DNA. Our cells read the encoded instructions and, interpreting the code, build each of the different proteins required for the maintenance of life.
Note that the instructions have changed a little since these five species had a common “backstory” (ancestor). Despite the changes, for the dog, mouse and chicken, the protein is fully functional. This is not so, however, for the chimp and human. The “dot” (highlighted by the red arrow) means that one single letter of the instructions has been deleted. This change would be like finding this sentence in the first edition of a book:
THE BIG RAT HIT THE RED MAT.
But, in the second edition of the same book, we find this instead:
THE BIG RAT HTT HER EDM AT
The sentence has no meaning anymore, but, as we compared the first and second versions of the book, we would be able to tell exactly what had happened: the letter “I” had been deleted from the sentence, and everything following would be messed up. A single deletion throws off the whole code from that point on. Thus, for chimps and humans, the instructions become gibberish, and the protein molecules produced according to that gene’s instructions are now badly mangled and unable to function.
As you go back and examine sequence in the human/chimp pseudogene, notice how both species carry the exact same deletion. This suggests that the occurrence of this single deletion occurred in one individual, a common ancestor with whom both species have a shared backstory.
Let’s return to our book analogy. Presumably all copies of the second edition had the exact same non-functional sentence about the BIG RAT. If someone was to examine two second edition copies of the book, each of which were missing that same letter, “I,” it would be unthinkable to propose that the exact same mistake occurred independently in the printing of each of the two books. Similarly, it would be incorrect to propose that the new incoherent sentence had some important meaning which literary scholars will discover some day. We would know, plain and simple, that a mistake had occurred. Anything other than that would be highly contrived.
Today both chimps and humans carry the exact same mutation because they both have the same backstory. However, it is even more poignant than that. There are 20,000 pseudogenes in the human genome. Each has its own unique backstory. Each can be traced out in the same manner we have just done for this one.
The hypothesis of common ancestry makes precise predictions about how pseudogenes will be distributed in related species. Once a gene has been mutated into a pseudogene in a certain species, that pseudogene with its specific inactivating mutation will be passed on to all descendents of that species.
The figure below demonstrates this for a specific pseudogene, which we will term pseudogene “y.” Note that in a very specific individual at a very specific time, gene “y” underwent a change in its code—it mutated. That altered code was passed on to the subsequent generations and ended up in two daughter species, Species A and Species B.
Now consider a second gene, which we call “x.” It also underwent a mutation, but did so earlier in the lineage. Let’s call the new mutated form of this gene pseudogene “x.” This is shown in the next figure. Since this mutation occurred earlier in the lineage in an organism that was a common ancestor to Species A, B, and C, all three of these species carry the abnormal, non-functional version of “x.” The lineage to species D, however, had already broken away. It does not carry the mutated version of “x.”
Finally, consider another gene, which we’ll call “z.” This gene is perfectly functional in Species A, B, and D. However, when you examine its code in Species C, guess what? It carries a non-functional pseudogene. What do you think has happened here? This is a recent change, so recent that it occurred in an individual whose ancestors only went on to become Species C. Here is a summary figure which illustrates the time at which each of the three mutations occurred and the ramifications of each change.
In this example, since gene “x” is mutated to a pseudogene in the common ancestor of species A, B and C, we would expect to find this pseudogene, with the same exact inactivating mutation, in these three species. Similarly, the pseudogene version of gene “y” with exactly the same code-change should be found only in species A and B. Finally, there are many cases in which a pseudogene is found only within one species, or, at most, a couple of closely related sister species. Pseudogene “z” is our example of that.
If life really does have a backstory of this sort, then you can see the power of this technique for tracing the lineage. It allows us to trace the history of life, species by species. Interestingly though, there have long been other—non-genetic—ways of tracing life’s history. Biologists have been using these alternative methods for many decades. For example, by examining fossils (paleontology) and tracing changes in body structure (comparative anatomy), the history of life had already been pretty much worked out before DNA sequencing data ever came into the picture.
For the most part, the data which are emerging from DNA sequencing projects simply verify that which biologists have known for years through these other methods of exploring life’s history. Still, the results are extremely gratifying in their consistency. In science, one looks for corroborating evidence. If the DNA data had suggested totally different lineages, then there would have been good reason to doubt the common descent hypothesis. Such is not the case though. The supporting data keep piling up; there is no longer any doubt.
Remember how science works. If there are multiple lines of evidence—each internally consistent with the central overarching principle—a consensus is reached. The theory is judged to be correct and the scientists move on to further explore its ramifications.
If the theory of common descent is true, then it also makes predictions about what we would not expect to find at the genetic level. We go on to explore this topic in our next post.