This article explains the significance of pseudogenes. Darrel Falk and Dennis Venema demonstrate how the theory of common descent provides a fine framework in which to interpret the existence of these once-functional genes. They also discuss the relationship of various primates and other mammals based on shared pseudogenes. The evidence in support of evolutionary theory, they argue, is compelling when examining the rich history disclosed in the DNA of every organism.
In 1962, science fiction author Philip K. Dick published The Man in the High Castle, an “alternative history” novel set in a world where Roosevelt was assassinated in 1933 and where the Allies lost the second world war to the Axis. The novel gripped audiences because of the terrifyingly real “what if?” scenario where major changes in history were brought about by seemingly small events. The familiar backdrop of shared history between the novel and the real world drew readers into the narrative and made the changes that much more frightening.
In some ways, comparing the DNA sequence between related organisms is like reading alternative history novels. The hypothesis of common ancestry between similar organisms makes a very straightforward prediction about their genomes: it simply predicts that they were once the same genome, in the same ancestral species. This hypothesis also predicts that these two genomes, having gone their separate ways in the diverged species, will have accumulated changes once they separated. Like an alternative history, each genome has the same backstory, and then a history independent from the other after the point of separation.
What this implies for species related through common ancestry is that their genomes should be similar. For example, researchers have now sequenced the complete genomes of twelve sister species of Drosophila flies, including the fruit fly, Drosophila melanogaster. As you might expect, these species have similar genomes to one another. Species with the most similar genes are thought to have shared a common ancestor more recently; species with less similar genes are thought to have shared an ancestor less recently. These findings at the gene level also matched nicely with similarity of their physical characteristics.
Having the complete genome sequence of all twelve allowed researchers to compare syntenybetween them. “Synteny” is the scientific term for finding the same genes in the same order in two different species. (The higher the synteny, the more genes are in the same order).
Drosophila species have about 14,000 genes lined up “single file” along their chromosomes. Below is the representation of a tiny portion of a chromosome of Drosophila melanogaster. Each number corresponds to a different gene. Notice that genes, 2799, 2807, 2808, and 2828 (and others which are noted only by the ellipsis) make up a syntenic block, Similarly the genes on the right (along with others not shown) also make up a syntenic block.
Now, here are these genes in a sister species, Drosophila ananassae:
Compare the gene order of the two sister species. Can you figure out what has happened to disrupt the block of genes 2799, 2807, 2808, and 2828—genes which exist side by side in melanogaster?
Here’s a little hint:
Got it? There were two simultaneous breaks at some point in history so that 2799, 2807, 2808, and 2828 are no longer syntenic. Nor is the other block syntenic any more. Notice that in ananassae the same genes are present but they are in an inverted order. Two syntenic blocks have been broken up. We know exactly how it happened.
Now imagine analyzing this for all twelve species and—in each case—examining all 14,000 (or so) genes. The position of every chromosome break in the time since the 12 species had a common ancestor has been mapped out. 40 million years of history1 has been all laid out showing the set of disruptions of the single file order in which the genes are stored. We even know about how often those disruptions occur in a lineage: breaks, like the two described above, take place about once every 200,000 years. This rate has been fairly constant in the approximately 40 million year history of these twelve lineages. Species that diverged only recently (judged by an independent mechanism) have only a small number of breaks and a large amount of synteny, On the other hand, species which diverged longer ago (again, as judged by an independent mechanism), have a much larger number of breaks and a smaller degree of synteny.
From the theological point of view, most would have little concern with this data. We have been discussing fly species. This—in the mind of most, after all—is just divergence within the fly “kind.”
The story, however, doesn’t end there. At the same time that this was happening in flies, it was also happening in much larger organisms.
Like primates, for example.
18 million years ago, there were no humans, chimpanzees, gorillas, or gibbons on earth. Their last common ancestral species, however, was here.
Just like for flies, we can trace the changes in the single-file-order of the genes for this lineage as well. Let’s examine human chromosome #1 and compare it to the order of genes in the gibbon with whom we share that common ancestor of almost 20 million years ago.
The figure above shows human chromosome #1. The dark boxes within the chromosome are “geographical markers,” which need not concern us here. This chromosome has about 4,200 of our 21,000 (or so) genes. The gibbon has almost the same gene complement. Note, however, that there have been two inversions (the dotted lines, above) in that time. Also note that there has been some other shuffling. The genes at the left end of human chromosome #1 (about 250 or so) exist as a contiguous block in chromosome 5 in the gibbon. Similarly, if we consider the genes just a little further to the right, the next block of about 200 genes is found as a syntenic block on chromosome 9 in the gibbon…and so on. Clearly there has been some shuffling, but not a lot. Just like forDrosophila, the syntenic blocks are still largely in place.
The complete sequencing of the human and chimpanzee genomes has allowed scientists to do the same comparison with our most closely related living species. It is only about 6 million years since the common ancestor of humans and chimpanzees lived on earth. Since then, as with closely related species of Drosophila, there have been changes in synteny, but not a lot. There have been several large inversions that have been precisely mapped and many small inversions where only a few genes have been flipped. Not unexpectedly, there is even one case of shuffling between chromosomes: some genes that existed as two contiguous blocks in the common ancestor 6 million years ago, have become joined into one block in humans—the now-somewhat-famous chromosome #2.
This chromosome is made up of two blocks of genes joined together that are on totally separate chromosomes in chimpanzees and gorillas (see below). The fact that human chromosome #2 matches two ape chromosomes suggests that it resulted from a fusion between two smaller chromosomes like the ones we see as separate chromosomes in apes. This prediction was confirmed by DNA sequencing: we see all the chromosomal markers we would expect from a fusion event, and this evidence is now fairly well-known among followers of the creation/evolution discussion.
What makes shared synteny for humans and chimpanzees challenging from an anti-common descent viewpoint is that there is no good biological reason to find the same genes in the same order in unrelated organisms, and every good reason to expect very different gene orders. In fruit flies, for example, the more distantly-related species have quite different gene orders and chromosome structures, yet they all are healthy, robust species. In order words, many different gene orders can get the basic biology of being a fly done. Similarly, in mammals, many different gene arrangements can be found, sometimes even within species. In humans, many chromosome rearrangements are known that do not produce disease. Some anti-evolutionary groups claim that if human chromosome #2 was indeed the result of a fusion event, that this would have caused disease or fertility problems. This is not the case: tip-to-tip chromosome fusions do not necessarily cause defects or reduce fertility. For example, many different “races” of mice with different chromosomal arrangements are known, including examples with multiple tip-to-tip fusions like human chromosome #2. Many of these “races” of mice remain fully fertile when crossed with “normal” mice. Populations of mice with very different chromosome arrangements have also been shown to arise very rapidly in nature.
In summary, should God have wished to avoid the appearance of common ancestry between humans and chimpanzees, there seem to have been many gene orders and chromosome structures available to Him to use for either species. Indeed, we see more dissimilar orders and structures in many groups of species whose common ancestry is not controversial even for Young-Earth Creationists. Yet what we see in humans and chimpanzees are genomes that immediately give the impression of being slightly modified versions of the same genome. This pattern of genome organization similarity also fits with other independent lines of evidence (such as DNA sequence similarity and comparative anatomy) for arranging species into groups of relatedness (phylogenies) While this pattern makes perfect sense in light of common ancestry and acts as an important independent test of phylogenies, it continues to puzzle those who attempt to explain life apart from evolution.
In the next section, we’ll explore how shared synteny allows researchers to predict where to find another feature of the human and chimpanzee genomes: shared pseudogenes.
In our previous section, we likened comparing genomes of related organisms to reading alternative history novels. We noted that before two species diverge, they share the same “backstory” but then go on to accumulate changes after separation.
One interesting feature of looking at genomes is that often we can find the mutated remains of once-functional genes. These are called pseudogenes, or “false genes.” Pseudogenes might be part of a shared backstory for two species, or they might crop up independently after two species go their separate ways. Either way, they are easy to spot at the molecular level because they retain a lot of similarity. For example, here are the DNA sequences for the start of one particular gene1 in several species (for our purposes, its function is not important).
As you examine the sequence of letters above, note that DNA contains a four letter code. This string of “letters” is made up of the molecules adenine, guanine, cytosine, and thymine strung together within the large super-molecule, DNA. Our cells read the encoded instructions and, interpreting the code, build each of the different proteins required for the maintenance of life.
Note that the instructions have changed a little since these five species had a common “backstory” (ancestor). Despite the changes, for the dog, mouse and chicken, the protein is fully functional. This is not so, however, for the chimp and human. The “dot” (highlighted by the red arrow) means that one single letter of the instructions has been deleted. This change would be like finding this sentence in the first edition of a book:
THE BIG RAT HIT THE RED MAT.
But, in the second edition of the same book, we find this instead:
THE BIG RAT HTT HER EDM AT
The sentence has no meaning anymore, but, as we compared the first and second versions of the book, we would be able to tell exactly what had happened: the letter “I” had been deleted from the sentence, and everything following would be messed up. A single deletion throws off the whole code from that point on. Thus, for chimps and humans, the instructions become gibberish, and the protein molecules produced according to that gene’s instructions are now badly mangled and unable to function.
As you go back and examine sequence in the human/chimp pseudogene, notice how both species carry the exact same deletion. This suggests that the occurrence of this single deletion occurred in one individual, a common ancestor with whom both species have a shared backstory.
Let’s return to our book analogy. Presumably all copies of the second edition had the exact same non-functional sentence about the BIG RAT. If someone was to examine two second edition copies of the book, each of which were missing that same letter, “I,” it would be unthinkable to propose that the exact same mistake occurred independently in the printing of each of the two books. Similarly, it would be incorrect to propose that the new incoherent sentence had some important meaning which literary scholars will discover some day. We would know, plain and simple, that a mistake had occurred. Anything other than that would be highly contrived.
Today both chimps and humans carry the exact same mutation because they both have the same backstory. However, it is even more poignant than that. There are 20,000 pseudogenes in the human genome. Each has its own unique backstory. Each can be traced out in the same manner we have just done for this one.
The hypothesis of common ancestry makes precise predictions about how pseudogenes will be distributed in related species. Once a gene has been mutated into a pseudogene in a certain species, that pseudogene with its specific inactivating mutation will be passed on to all descendents of that species.
The figure below demonstrates this for a specific pseudogene, which we will term pseudogene “y.” Note that in a very specific individual at a very specific time, gene “y” underwent a change in its code—it mutated. That altered code was passed on to the subsequent generations and ended up in two daughter species, Species A and Species B.
Now consider a second gene, which we call “x.” It also underwent a mutation, but did so earlier in the lineage. Let’s call the new mutated form of this gene pseudogene “x.” This is shown in the next figure. Since this mutation occurred earlier in the lineage in an organism that was a common ancestor to Species A, B, and C, all three of these species carry the abnormal, non-functional version of “x.” The lineage to species D, however, had already broken away. It does not carry the mutated version of “x.”
Finally, consider another gene, which we’ll call “z.” This gene is perfectly functional in Species A, B, and D. However, when you examine its code in Species C, guess what? It carries a non-functional pseudogene. What do you think has happened here? This is a recent change, so recent that it occurred in an individual whose ancestors only went on to become Species C. Here is a summary figure which illustrates the time at which each of the three mutations occurred and the ramifications of each change.
In this example, since gene “x” is mutated to a pseudogene in the common ancestor of species A, B and C, we would expect to find this pseudogene, with the same exact inactivating mutation, in these three species. Similarly, the pseudogene version of gene “y” with exactly the same code-change should be found only in species A and B. Finally, there are many cases in which a pseudogene is found only within one species, or, at most, a couple of closely related sister species. Pseudogene “z” is our example of that.
If life really does have a backstory of this sort, then you can see the power of this technique for tracing the lineage. It allows us to trace the history of life, species by species. Interestingly though, there have long been other—non-genetic—ways of tracing life’s history. Biologists have been using these alternative methods for many decades. For example, by examining fossils (paleontology) and tracing changes in body structure (comparative anatomy), the history of life had already been pretty much worked out before DNA sequencing data ever came into the picture.
For the most part, the data which are emerging from DNA sequencing projects simply verify that which biologists have known for years through these other methods of exploring life’s history. Still, the results are extremely gratifying in their consistency. In science, one looks for corroborating evidence. If the DNA data had suggested totally different lineages, then there would have been good reason to doubt the common descent hypothesis. Such is not the case though. The supporting data keep piling up; there is no longer any doubt.
Remember how science works. If there are multiple lines of evidence—each internally consistent with the central overarching principle—a consensus is reached. The theory is judged to be correct and the scientists move on to further explore its ramifications.
If the theory of common descent is true, then it also makes predictions about what we would not expect to find at the genetic level. We go on to explore this topic in our next section.
As we indicated earlier, pseudogenes are remnants of once-functional genes. Since they are segments of a DNA molecule, they are faithfully copied and passed along generation after generation through the millennia of time. For this reason, they serve as excellent markers. They allow us to trace ancestry.
Consider, for example, the lineage shown in the diagram at right. In this example, Species A and B diverged from a single ancestor (red) fairly recently. Since there has been little time for them to evolve, the species are similar to each other.
Species A, B, C are also derived from a common ancestor (blue). That ancestor lived much longer ago, so the three species have had more time to diverge. Thus they may look quite different from each other. Finally, a very long time ago, there lived an even more ancient ancestor (yellow). This one gave rise to all four of these species. Having had lots of time to evolve, this ancestor’s descendents have become increasingly different from one another.
With regard to pseudogenes, the theory of common descent makes a prediction. Let’s say that in sequencing a genome, one finds a specific pseudogene (we’ll call it ‘y’) in Species A, but it is not found in Species B (see below).
If the theory of common descent is true:
- The event which gave rise to pseudogene ‘y’ occurred recently. It could not have been present in the common ancestor highlighted in red in the diagram to the left; otherwise both Species A and B would have had it.
- Since the Species A/Species B common ancestor didn’t have the pseudogene, earlier ancestors could not have had it either.
- Species C and D, because they are derived from those earlier ancestors, would not have pseudogene ‘y’ either.
With the sequencing of many genomes, it is now a straightforward matter to test this hypothesis. This can be done not by examining one or two genes, but by examining hundreds, even thousands. Do all pseudogenes fit into this sort of pattern? We will examine this question by considering one particular subset.
Our sense of smell is made possible through a set of proteins, the olfactory receptors, found on the surface of cells lining the nasal cavity. Airborne compounds bind to these receptors, thereby sending signals to the brain, which then interprets the pattern of binding as a particular odor.
Recently it has become apparent that many mammals have lost some of their olfactory receptor proteins through mutation of the genes which produce them: the mutated genes have been converted into non-functional pseudogenes. It is possible to compare the distribution of numerous olfactory receptor pseudogenes in several primate species.
Let’s first consider 15 pseudogenes1 present in humans but not in chimpanzees. According to the theory of common descent, these 15 pseudogenes have arisen since humans and chimps last had a common ancestor about six million years ago. If this is so, we would predict that none of the 15 pseudogenes will be present in primates believed to have diverged even earlier. As illustrated at right, this is exactly what we find when examining gorilla and orangutan genomes.
What about other olfactory pseudogenes? Do they follow the same sort of pattern? Are they in the “correct” places? Indeed they are – every one:
Six pseudogenes with identical inactivating mutations are found in all four species. Humans and chimpanzees share twelve identical pseudogenes (6 plus 3 plus 3) in common, but humans and gorillas share only nine (6 plus 3). These nine, as predicted, are a subset of the twelve shared by humans and chimpanzees.
Using pseudogene evidence alone, in the absence of any other line of evidence (gene homology, shared synteny, anatomy, etc), it would assemble these species into the same pattern of relatedness as any of the others. Indeed for the 47 pseudogenes studied, not one is out of place. We can tell when in the ancestry each arose relative to the others, and no cases exist where the same pseudogene appears in a manner inconsistent with the proposed lineage. Also recall that this is only 47 pseudogenes within a single family of genes: many, many more have been analyzed and they give parallel results.
As compelling as this pattern is, pseudogene data can also be extended to much more distantly-related species. All mammals, for instance, are predicted to be the evolutionary descendents of egg-laying ancestors. Indeed, the fossil record contains species classified as “mammal-like reptiles” as well as “reptile-like mammals” that blur the distinction between these groups. The evolutionary prediction that mammals are descended from egg-laying ancestors was tested recently using the hypothesis of shared synteny to look for the inactivated remains of a gene devoted to egg-yolk production in the human genome. This gene, called the vitellogenin gene, is used as a component of egg yolk in a wide array of egg-laying species. This research group wondered if it would be possible to find the remains of the vitellogenin gene in the human genome. To help in their search, they employed the prediction of shared synteny.
You may recall from our previous section on synteny that over time, blocks of genes in diverging species are increasingly “broken up” into smaller and smaller blocks. Very closely linked genes, however, can stay together for a very, very long time. Using this knowledge, the researchers:
- Located the (functional) vitellogenin gene in the chicken genome,
- Took note of the gene “next door” to the vitellogenin gene in the chicken,
- Looked to see if this neighboring gene was present in the human genome (it was – a functional version of this gene is found in humans),
- Looked in the same relative spot next door to this gene in the human genome, and
- Discovered the mutated remains of the vitellogenin gene in the human genome in precisely this location.
While it might be possible to present a (strained) argument for the presence of olfactory receptor pseudogenes in humans, the mere presence of the mutated remains of a gene required for making egg yolk in the human genome should give even the most ardent anti-evolutionist pause. That this gene was found using the prediction of shared synteny between humans and chicken only adds to the impact.
Common ancestry is an elegant, parsimonious explanation for the pattern of pseudogenes that we observe, yet many Christians reject common ancestry for theological reasons. The challenges for a non-evolutionary explanation of this data, however, are many:
- Why do humans (or any species for that matter) have so many inactivated genes?
- Why does the distribution of these inactivated genes match precisely the pattern of relatedness (phylogenies) predicted by other, independent criteria? Why are there no “out-of-place” pseudogenes?
- Why are pseudogenes found in the precise locations predicted by shared synteny?
- Why are some of these inactivated genes dedicated to functions that make no sense for the species that harbors them (e.g. defective genes for egg-yolk production in placental mammals like humans)?
To be blunt, if this pattern is not to be accepted, why did God put it in place for us to discover? And if this pattern is not to be trusted, how can anything in genetics be certain? As a colleague once commented, this pattern “would deceive all honest investigators” if in fact it is not accurate.
Many believers are troubled by the idea of humans sharing ancestry with other forms of life (and its attendant theological issues). Consider the opposite, however: suppose that common ancestry is in fact incorrect. The trouble is this: the data doesn’t go away. In this case, one still has to explain why the data looks the way it does: why did God choose to create independent species with this pattern? Even among anti-evolutionists there is no satisfying answer to this question. Time and again, what we see from Christian anti-evolutionary organizations is not an attempt to wrestle with the data, but rather to obfuscate it.
These lines of evidence are becoming more and more widely known among believers and non-believers. If Christians continue to insist on denying the implications of this (very solid) science, we greatly risk setting a stumbling block before our brothers and sisters in Christ, or bringing the faith into disrepute with those whom we seek to reach with the Good News.
What is BioLogos?
BioLogos explores God’s Word and God’s World to inspire authentic faith for today. Join us to receive the latest articles, podcasts, videos, and more, and help us show how science and faith work hand in hand.