Genomes as Ancient Texts
…while errors made by human scribes tend preserve a meaning of some kind...DNA replicating enzymes do not check to see if meaning (i.e. function) is preserved as they copy.

You will recall from the last few posts in this series that speciation begins when a barrier to allele flow arises that separates what was once a freely-interbreeding population into two populations. Once the two populations cannot exchange alleles, (or do so only at a reduced rate) differences accumulate in the two populations that are not averaged between them. Additionally, we have seen how the founder effect can give this process a head start, since it can lead to differences right at the point of separation. Once two populations have separated and ceased to interbreed, new alleles that arise in either population will not spread to the other, which leads to differences accruing over time.
The point to emphasize for our purposes is this: when speciation starts, it starts from a common ancestral population. This means that the two populations, at the start, have (nearly) identical genomes. Indeed, the only differences at this early stage will be the few genes that have different allele frequencies due to imperfect sampling when the two populations separated. From this starting point, the two populations will begin to slowly accumulate differences – but these differences will be minuscule. The overall pattern will predominantly be one of identical sequences, with only a small number of differences.
Genomes as ancient texts: an analogy
Perhaps an analogy would be helpful here. Prior to the invention of the printing press, manuscripts in the ancient world were copied by scribes. Though a good scribe could be counted on to provide a highly accurate copy, small copying errors were inevitable. These changes, however, would not be so large as to render the copies unrecognizable – the vast majority of the text would be correct. Once a copy was made (with the small errors in contained), it would often serve as the starting point for further copies. If so, the errors would be copied in the process, since the next scribe would also try to copy the manuscript as faithfully as possible (though he too might introduce new errors of his own).
We can similarly think of a genome as a “text” that is passed on through copying with the possibility of copying errors. As with all analogies, however, there are some important differences. While human scribes interact with the meaning of the text they are copying, the genome “scribes” – enzymes that copy DNA sequences based on pairing monomers – do not. This means that while errors made by human scribes tend preserve a meaning of some kind (even if it is an altered meaning), DNA replicating enzymes do not check to see if meaning (i.e. function) is preserved as they copy. (The functional check for a DNA sequence will come later as that particular organism develops (or not) and reproduces (or not). In other words, natural selection is the check for “meaning” for a DNA sequence).
To carry the analogy further, we could consider an organism’s genome to be like a book, with chapters, paragraphs and sentences. For a genome, the “chapters” might be the sequences of whole chromosomes; paragraphs would correspond to genes; and sentences the sub-components of genes. We could also consider printing runs of a book to be like a replication event. For example, consider two independent printings of the same manuscript. They would, of course, be almost identical – but suppose the two printings had specific typos that did not greatly alter the meaning of the text, and were thus missed by the editors: the 1st printing on page 14, and the 2nd printing on page 23:
Now suppose that the original manuscript is lost, and the 3rd printing of the book is typeset from a copy of the 1st printing. This new printing would have the typo on page 14, and any new typos that happened to creep in (say, one on page 8):
Imagine that over the course of this book’s history, there are five known printings arranged as follows:
Now imagine that a previously unknown, 6th printing is found. This printing has the exact same characteristic typos on pages 14 and 3 found in the 1st and 4th printings, as well as a unique typo not previously seen before in any printing, on page 5.
There are, of course, several possible explanations for the provenance of the 6th printing, with one explanation being more likely than the others. In increasing order of likelihood, some options are:
- The “6th printing” is in fact a separately authored book that is not a copy of the original manuscript of which the 5 printings are copies.
- The 6th printing is a direct copy of the original manuscript, but the editor happened to independently make two of the exact same errors found in other manuscripts as well as a new error on page 5.
- The 6th printing is a direct copy of the 1st printing, but the editor happened to independently make one of the exact same errors found in another manuscript as well as a new error on page 5.
- The 6th printing is a direct copy of the 4th printing, but the editor happened to make a new error on page 5 in the copying process.
It should go without saying that option #1 would not be seriously considered by literature scholars, given the nearly identical text shared between the newfound printing and the other known printings. Option #2 would require that two rare events (typos) happen independently, twice over, in two printings. As such, it is less likely than option #3, which requires only one rare event to happen twice over independently. Option #4 is of course the best option, because it does not require that any rare events happen twice in independent copies. In this scenario, the reason the 6th printing has the features it does is because it is an (imperfect) copy of the 4th printing:
In scientific terms, this option is the most likely, or parsimonious one: it offers an explanation for the provenance of the 6th printing with the fewest number of low-probability events.
Back to biology
Now consider speciation events in light of our analogy. At the point of separation, the two populations have nearly identical “books” (i.e. genomes). As lineages go their separate ways, “typos” (i.e. mutations) can occur in genes that are then passed down to the descendants of that lineage, just as we have seen for typos accumulating in copied texts:
If indeed speciation events produced Species A – D from a common ancestral population, we would expect their genomes to exhibit certain features when compared to each other. First and foremost, their overall genome sequence and structure should be highly similar to each other – they should be versions of the same book, with chapters and paragraphs of shared text in the same order. Secondly, the differences between them would be expected to fall into a pattern. Species C and D, for example, would be expected to share some features as the result of sharing a common ancestor (Species A) more recently than they do with Species B. In this simple diagram, for example, Species C and D would have an identical mutation in gene 1, and the most parsimonious explanation for it would be that they both inherited it from a common ancestor (Species A). This would be much more likely than both species having the same mutation occur independently at the exact same location in both genomes.
Put more generally, the hypothesis of common ancestry makes specific predictions about the pattern one should observe when examining genomes. In tomorrow’s post, we’ll see how well those predictions hold up when examining actual genomics data for a proposed group of related species.
Previously, we discussed some features of what we would expect when comparing genomes between similar species, if indeed those species descended from a common ancestral population. Returning to our “book” analogy for genomes, we made the point that the first thing to look for would be overall structure of two genomes proposed to be descended from a common ancestral genome: are the “chapters,” “paragraphs,” etc., in the same order? Do they use the same “sentences”? and so on. In other words, do the genomes of any existing species look like they are slightly modified copies of each other?
The answer to this question from modern comparative genomics is an emphatic yes. What we see when we compare genomes of species that we suspect to be relatives is very much that they do indeed look like copies of each other. In some cases, the match between two genomes can be in excess of 95%, DNA monomer for DNA monomer, for the bulk of the two genomes. Not only do they have the same genes, but they have them in the same order on their chromosomes, with each chromosome in the two species having a match in the other species. Imagine finding a book that was 95% identical to another manuscript, with chapters, paragraphs and sentences all in the same order, with only small differences between them – that is the sort of impression one gets when comparing genomes between some species.
One group that has been analyzed in some detail are various species of fruit flies (in the genus Drosophila (pronounced “Dro-SOF-i-la”). Scientists have now determined the complete genome sequences for twelve Drosophila species and compared them with one another. Some species have nearly identical genomes, exactly as one would predict if they were once the same species with a common genome. While it’s not possible to show large amounts of DNA sequence here, let’s examine a small fragment of one “sentence” (i.e. gene) in three species of fruit flies (Drosophila yakuba, Drosophila simulans, and Drosophila sechellia) – all known to be distinct fly species:
The first impression one gets when looking over the sequences is that they are nearly identical. This is not unusual for these three species – indeed, this pattern applies to every gene they have (and they all have the same genes). The second thing one notices, however, is that there are rare differences – in this small snippet, two of the species (D. simulans and D. sechellia) have an “a” in the fourth position of this gene, where D. yakuba has a “t”.
Think back to the analogy that we used yesterday – that of book printings and typos shared between them. These sequences in these three species are very much like the hypothetical printings in our analogy, and we can make sense of the pattern we observe in much the same way:
(Now, astute readers will also note that the other possibility is that the “a” at the fourth position is the original text, that the “t” is the typo, and that a (t to a) mutation happened once on the lineage leading to D. yakuba. If you’re wondering about this option, well done. The trick to deciding what the original text is to look at as many copies as possible – and in this case, when we look at a wide number of additional Drosophila species, we see an “t” in the fourth position in most other species, and an “a” only in D. simulans and D. sechellia. This means the most parsimonious choice is that the “t” is the original, and the “a” is a mutation).
While this is only a small example, it illustrates what scientists observe when comparing the genomes of species they suspect to be relatives based on other criteria (such as morphology). What they see is precisely what one would expect if indeed speciation events had occurred to produce the species in question: nearly identical genomes, with small changes shared between some species.
Identity beyond what’s necessary for function at the DNA level
A further observation that supports the hypothesis that these sequences are copies of an ancestral sequence is that the level of identity (matching of sequence) between them is greater than it needs to be, even when the function of the gene is considered. Let’s return to the gene fragment that we were just examining. This sequence, as the start of a gene, codes for a protein with the same function in all three species. (If you need a refresher on how genes are made up into DNA monomers that are eventually translated into a sequence of amino acid monomers, you can refer to two prior posts in this series, here and here.) In these species this protein has the following sequence for the first eight monomers (amino acids):
As you can see, the second amino acid is different between the two sequences, but the other amino acids are identical. What is important for our purposes here is to note that that there are many, many ways to write this “sentence” and arrive at the same meaning (sequence of amino acids). This is possible because for most amino acids, there are several DNA monomer combinations (of three nucleotides) that produce the same amino acid when translated. For example, the sequence in D. yakuba could also be written as follows (among many other options):
This sequence is quite different from what we see in the other two species:
In this case, only 14 of the 27 DNA monomers match – an identity of only about 52%). What we observe between these species, however, is that 26 out of the 27 monomers match (over 96% identity). In other words, it would be possible for these two genes to be much less identical at the DNA level, and still have the same amino acid sequences that we observe in the two species. Yet, what we see when we compare the two genes, is that they match at the DNA level much more than they need to in order to have the same amino acid sequence. A simple explanation is that the two sequences match because they are copied from the same original sequence.
Identity beyond what’s necessary for function at the amino acid level
A second observation that supports the hypothesis that the D. yakuba, D. simulans and D. sechellia gene sequences are in fact copies follows from examining other fly species that are less closely related to these three. All Drosophila species examined to date have this gene, but in more distant relatives the sequence can be somewhat different. For example, this gene in D. mojavensis has the following DNA sequence:
Again, some of the sequence remains identical in all four species (supporting the hypothesis that this sequence is also a copy, but with more changes), but now we see greater differences. Despite these differences, however, the D. mojavensis version of the gene is perfectly functional and does the exact same job as the gene in the other three more closely related species.
So, these observations indicate that there is no biological need for nearly identical genes at the amino acid level, or even at the DNA level, in different species. Numerous amino acid sequences, and even numerous DNA sequences, are equally capable of performing the same function. Yet, what we see time and again (across whole genomes!) are nearly identical genes, with a few (often shared) differences – exactly what speciation events would be expected to produce.
What about “common design”?
One question I am frequently asked when presenting this sort of data is that of “common design” as an alternative explanation. In other words, could these sorts of patterns be explained as separately created species that do not share ancestry, but rather were designed (created) to have the same (or similar) genes because those genes need to have the same (or similar) functions?
We have already seen the basic problems with this line of argument – that genes (and entire genomes) of similar species match much more than they need to – and that the differences we see in closely related species are arranged in exactly the pattern one would predict if speciation events had produced them.
Just flies
Of course, most Christians don’t lose sleep over the possibility that numerous fruit fly species arose over time through multiple speciation events. As we have mentioned previously, even most Young Earth Creationists accept speciation events such as these. What is more contentious, of course, is the question of whether the pattern of shared ancestry extends to our own species. Next, we’ll examine this question by comparing the human genome to the genomes of our proposed nearest living relatives – the great apes.
Review – comparing genomes as texts:
Earlier, we discussed what features we would expect to see in similar species if those species descended from a common ancestral population. Drawing again on our “copied book” analogy, we expect the following:
“Chapters” and “paragraphs” in the same order: closely related species should have large blocks of DNA sequence in the same order. These blocks may span entire chromosomes. Even if some rearrangements are present, the overall pattern should largely match between the two genomes.
“Sentences” and “words” that match each other: at the level of individual genes, we should see that they use the same (or nearly the same) DNA sequence, even when there is no biological necessity for them to do so.
“Typos” that may be shared between texts: if sequence changes (mutations) exist, we would expect them to be shared in some instances (if copied from a common ancestor) and unique in other cases (if they are new mutations that arose after two species separated).
In summary, the expectation is that genomes of species that share a common ancestral population will look like slightly modified copies of each other. As we saw previously, this expectation was easily met when comparing different fruit fly species. With this understanding in hand, we are now ready to explore the possibility that our own species arose through speciation events by making the same kinds of comparisons with other species.
Comparing primate genomes at the “chapter” level
The first line of evidence in favor of humans sharing ancestry with other forms of life is straightforward – there are other species that have a genome that is nearly identical to our own – the genomes found in great apes such as chimpanzees, gorillas and orangutans. Compared to our “book,” the “books” of these species match at the chapter and paragraph level – all three species have DNA sequences that have the same genes in the same basic order as we do. There are subtle differences, of course – blocks of sequence that have been rearranged through breakage and rejoining of chromosomes, as expected – but the overall pattern is clear. This large-scale match between our genome and the genomes of great apes has been known since the 1970s, when researchers began to compare the physical structure of ape and human chromosomes using light microscopes. For humans and chimpanzees, most chromosomes match precisely. In other words, the two genomes appear exactly as one would predict if in fact they are slightly modified copies of an ancestral genome. (For those interested in seeing the pattern for the entire human, chimpanzee, gorilla and orangutan genomes, you can refer to these figures (PDF) from a paper published in 1982.)
Despite the overwhelming identity between the structure of our genome and that of the great apes, the differences should be ones that can arise through known mechanisms if shared ancestry is the correct interpretation. For example, some human chromosomes have a region of their sequence that fails to line up with the corresponding chromosome in chimpanzees. When these chromosomes are stained using a dye that binds to DNA and are examined under a light microscope, the dye produces banding patterns that allows the chromosome sequences to be compared at the “chapter” level of organization:

(Image redrawn from Yunis and Prakash, 1982).
Along their length, these two chromosomes match for the most part, but the region outlined in red does not. Closer inspection, however, reveals that even within this region, there is a match – but that the sequence is reversed between the two chromosomes. This pattern is one that is readily explained by what is known as a chromosome inversion event – a type of mutation where DNA breaks and rejoins at two places to “flip” a section of chromosome:
These types of mutations are fairly common, and have been observed many times in experimental organisms and humans (where an individual will have such a mutation on one of their chromosomes). As long as the breakpoints of the inversion do not destroy a needed gene or create some other problem, these sorts of mutations are relatively harmless to the individual that carries them. When comparing the human and chimpanzee genomes, there are several chromosomes that show evidence of inversions, and these contribute to the subtle differences between the two genomes.
The largest difference between humans and great apes at the chromosome level of organization is a match between one chromosome in the human genome (human chromosome 2) and two smaller chromosomes in great apes:

(Image redrawn from Yunis and Prakash, 1982).
This difference is what gives humans a different number of chromosomes than chimpanzees – humans have 23 pairs of chromosomes (46 in total) while chimpanzees have 24 pairs (48 in total). This pattern immediately suggests one of two possibilities, if indeed humans and chimpanzees share a common ancestral population. The first option is that one chromosome broke apart into two chromosomes in one lineage but not the other. The second option is that two smaller chromosomes fused together to form one in one of the lineages. Recalling our “typo” analogy, we can represent these two options as unique events that alter an original “text.” In the first option, the original population has 46 chromosomes, and the lineage leading to chimpanzees has a chromosome splitting event.
The second option has the original population with 48 chromosomes, and a fusion event on the lineage leading to humans:
Both types of events are possible, so on the surface it is not possible to decide which is more likely. As we discussed previously when examining typos in copied books, the easiest way to determine what the original text was is to look at as many copies as possible. The other great apes (gorillas and orangutans) also have the two smaller chromosomes (total = 48), suggesting that this is the original structure that was present in the common ancestral population of humans and other great apes. To explain the pattern through chromosome splitting events, this rare event would have happened repeatedly in several lineages in exactly the same location:
Accordingly, the most parsimonious explanation is that human chromosome 2 resulted from a fusion event.
You might recall watching movies of cells dividing in high school biology class, complete with chromosomes being pulled around. Those chromosomes were being pulled apart towards the poles of the dividing cell using a special DNA sequence called a centromere. This sequence allows the cell’s machinery to grasp the chromosome and move it around. Every chromosome needs a centromere, or the cell won’t be able to pull it during cell division.
This observation makes a prediction: if indeed human chromosome 2 is the result of a fusion between two smaller chromosomes, it would have had two centromeres immediately after the fusion event. One of the centromeres would likely be rendered non-functional through mutation soon after, since two are redundant. Human chromosome 2 has only one active centromere, which lines up with the centromere of the smaller of the two chimpanzee chromosomes. When the genome project sequenced human chromosome 2, they found the mutated remains of a second centromere in precisely the spot one would predict it to be if in fact these chromosomes are modified copies of each other:
As an aside, we also now know that this fusion event predates the origin of our species, since the chromosome 2 fusion is present in the Denisovan hominids, a species more closely related to us than chimpanzees. This indicates that the fusion event is a “typo” shared between us and other closely related species. Recent work has also documented the events that shaped this region of our genome in exhaustive detail for those who desire more information.
Summing up
Taken together, what we observe when comparing the overall structure of the human genome to other primates is that (a) our genomes do indeed have the features one would predict them to have if they are copies of a shared ancestral genome, and (b) the differences we do observe are easily accounted for by well-known mechanisms. These observations strongly support the hypothesis that our species arose through an evolutionary process.
Next, we’ll delve past the “chapter” level of genome organization to see if the detailed sequence data of individual genes also supports this hypothesis.
Comparing genomes at the “sentence” level
In a previous post, we compared the DNA sequences of a gene found in a number of species of Drosophila. Such comparisons are also possible using DNA sequences from mammals (including humans and other primates), and the pattern they produce is by now familiar:
As we saw for the Drosophila sequences, this gene is nearly identical across a number of species. Specifically, the human sequence and the sequences of three other primates (chimpanzees, gorillas and orangutans) differ by only a handful of DNA monomers (at the most, 4 of the 90 are different). Also, as we have seen before, there is no biological need for these sequences to be this identical – in fact, even for this small region of this gene, there are over 53 million (!) different ways to code for the exact same amino acid sequence. Most of those sequences are much more different from the human sequence than the nearly identical sequences we observe in other primates. To carry the point further, there is also no particular biological need for this gene to have the exact amino acid sequence we see shared among primates. In other organisms (such as dogs and wolves) a slightly different sequence performs the same task equally well.
Of course it is not possible to show DNA alignments of large swaths of DNA sequence in this format. This small gene segment, however, is representative of genes (and even whole genomes) among primates. A detailed comparison of all gene sequences between humans and chimpanzees, for example, reveals that they are 99.4% identical across 1.85 x 107 (18 million) DNA monomers. Note that regions of the genome that code for genes are a tiny minority of genome sequences – humans and chimpanzees have over 3.0 x 109 (3 billion) DNA monomers in their genomes. Of these 3 billion monomers, 2.7 billion of them align with each other with only a 1.23% difference between them.
In short, when comparing DNA sequences between humans and other primates, we see exactly the pattern we would predict based on shared ancestry – a pattern consistent with slight modifications to an ancestral genome.
Looking for typos
In a previous post in this series, we discussed how DNA replication is a highly accurate process, but not a perfect one. These two features of DNA replication mean that mutations can occur to genes when they are copied, and that future copies made from a mutated template will faithfully pass that mutation on (at least, until a second mutation occurs at the same location to change things once again). What this means is that gene sequences can persist in genomes for a long time after they are mutated to a non-functional sequence if there is not a selective disadvantage for losing the function in question. (If a mutation does result in a disadvantage, then natural selection will tend to remove it from its population, as we have discussed previously.)
One such example involves a gene that codes for an enzyme (L-gulonolactone oxidase, or “GULO”) that is required the synthesis of vitamin C in mammals. Most mammals make their own vitamin C from other compounds in their diet, and the GULO gene is necessary for the last step in the process that converts a vitamin C precursor to the final product. As we have seen for other genes, the sequence for this gene is conserved between mammals – it has a nearly identical sequence that is maintained through natural selection. For example, a portion of this gene in cows, dogs and rats has the following sequence (with differences from the cow sequence outlined in black):
In all three of these species, this gene is functional, and all three can make their own vitamin C without obtaining it directly from their diet.
Humans, of course, cannot make their own vitamin C – we get scurvy if we do not obtain vitamin C from our diet. This atypical situation (for a mammal) is shared by other great apes, and for the same reason. Though these species have some of the DNA sequence for the GULO gene, it has numerous mutations in it that render the gene unable to make a functional enzyme product. The same region of the GULO gene shown in the above figure has the following sequences in humans, chimpanzees and orangutans (now with differences from the human sequence outlined in black):
Once again we notice that the primate sequences are nearly identical to one another. One new feature to note here, however, is that these three copies of the GULO gene are non-functional in part because they have a deletion mutation – the removal of one DNA monomer (highlighted in yellow in the primate sequences). This deletion mutation is identical in all three species, providing evidence that it is a “shared typo” copied from a prior text – or, in biological terms, a deletion mutation that happened once in the common ancestor of humans, chimpanzees and orangutans, and was then inherited by all three species. Dogs, cows and rats, however, branched off of the lineage leading to primates before this deletion event occurred:
The loss of GULO function does not seem to have been a selective disadvantage for primates at the time – likely because they had a diet rich in vitamin C. Indeed, even for humans, this loss is not a serious problem unless one finds oneself without a source of vitamin C for a prolonged period of time.
The nose knows
As interesting as the GULO example is (and it is an example I have discussed in more detail in another context) it is but one of many examples of shared, identical mutations found in the human genome and other primate genomes. One study that examined shared primate mutations in detail investigated mutations in genes devoted to the sense of smell. These genes, called olfactory receptors, are proteins found on the membrane of cells in the nasal epithelium in mammals. Olfactory receptors do their job by binding on to compounds in the air, changing shape in the process, and signaling that change in shape to the nervous system in what we perceive as the sense of smell. The combined action of numerous olfactory receptors acting in concert is what gives any given smell its distinctive features. Mammals dedicate a disproportionate amount of their genome to olfactory receptor genes, most likely because such genes are so useful for finding food, finding mates, and in general perceiving one’s environment. Despite their usefulness, these genes can also be mutated and lost – and indeed, the human genome shows that our species has lost several due to mutation. As for the GULO gene, however, these defective olfactory gene sequences persist in recognizable form. What is more important for our purposes, however, is the pattern these mutated genes form when compared to other primate genomes. As we first introduced with our copied book analogy, we expect to find some typos that are shared between texts, and other typos that are unique to one edition. For defective olfactory genes, we observe precisely these two categories – shared mutations, and unique mutations:
As you can see from the diagram above, humans share the most identical olfactory gene mutations with chimpanzees, fewer with gorillas, and fewer still with orangutans. Of the 12 mutations that are identical between humans and chimpanzees, 9 are also identical with gorillas, and 6 with orangutans. These shared mutations and the pattern we find them in are easily explained through shared ancestry, as indicated in red on the diagram above. The mutations unique to a given species are also easily explained as arising after populations separate (in blue).
It’s also important to note what we do not see when comparing these mutations between primates. We do not observe identical mutations between humans and gorillas, for example, unless we always see the exact same mutation in chimpanzees. This makes perfect sense if the common ancestral population of humans and gorillas is also the common ancestral population of humans, gorillas and chimpanzees. Likewise, if we observe identical mutations shared between humans and orangutans, we can predict with confidence that we will observe these exact mutations in gorillas and chimpanzees – and in fact we do. This pattern of shared mutations is precisely what one would predict if in fact it was produced by shared ancestry – with nothing out of place.
We have been examining the overall patterns we observe when comparing genomes of distinct species with each other. What we have seen is that the pattern we observe is entirely consistent with species sharing common ancestors – and their genomes, accordingly, as modified copies of what was once the genome of their shared ancestral population. What we have not yet discussed in detail, however, is that the lines of evidence we have examined – comparisons of genome structure, functional gene sequences, and specific mutations in inactivated genes – all cohere into a mutually supportive pattern. The cohesion of multiple, independent lines of evidence is exactly what a good theory is built on, and comparative genomics has provided very strong support to evolutionary theory.
Starting with shared “typos”
Let’s return to the example of inactivated olfactory receptor genes that we discussed earlier. Based on this small subset of defective genes alone, we constructed the following “family tree” for humans, chimpanzees, gorillas and orangutans:
A proposed family tree for related species – or phylogeny, to introduce the technical term – is a graphical way to both (a) represent a large data set and (b) propose a hypothesis for how that data set came to be. In this data set, we have two categories of features to explain – mutations that are identical in more than one species, and mutations that are unique to a single species. And as we discussed previously, the above phylogeny fits the data very neatly – the patterns of the shared events and the unique events are supported by the same phylogeny. Shared events occur once, in a common genome, and unique events occur after two species have gone their separate ways.
What in fact this phylogeny is proposing is that these four species have differing amounts of shared history and separate history. For example, humans and chimpanzees would have the largest amount of shared history of these four species (outlined in yellow), and comparatively smaller separate histories (outlined in blue and red):
Humans and gorillas, however, have less shared history (and more separate history) over the same time period:
And lastly, orangutans and humans have even less shared history compared to the other primates, and more separate history:
So, from this relatively small data set (a handful of shared mutations in a few genes) we have a detailed hypothesis of which species share the most history in common – a hypothesis we can test using other lines of evidence.
From typos to sentences
Now that we have used a small subset of the “shared typos” found in these four genomes to assemble a proposed phylogeny, we can consider what this phylogeny would predict when comparing the sequences of individual genes in these four species. The key is the shared history portion of a phylogeny for any two species – during this portion, what will later become two species is only one population, with a common genome. As such, they will have the same sequence for any given gene (unless there is variation within the population for that gene, in which case the population will share a common pool of alleles of that gene). The longer two species have a shared history, the more similar we expect their gene sequences to be. The longer they have had a separate history, the more different we expect their genes to be, due to mutation events occurring in the “separate history” portion of the phylogeny.
With the sequencing of the orangutan genome in 2011 and the gorilla genome in 2012, we now can assess this prediction using a very large data set for all four species. Human and chimpanzee sequences are nearly identical on average (98.6% identical); humans and gorillas slightly less so (98.3% identical); and humans and orangutans even less so (96.6% identical). These results fall into the predicted pattern:
In other words, using large swaths of genome sequences to assemble a phylogeny for these species produces the same phylogeny that the shared mutation data does. The pattern required by the shared mutation data (a tiny subset of the DNA sequences of these species) is also the same pattern that best explains the overall genome identity we observe.
From sentences to chapters
With shared mutation data and overall sequence data supporting the same phylogeny for these species, we can go on to further test this hypothesis using genome structure – the spatial organization of genes on chromosomes, or “chapters” to return to our copied book analogy. As for the shared mutation data, the similarities and differences we observe are expected to fall into either shared features (in the predicted pattern) or unique features (that arose once species had separated from other species). Overall, when comparing chromosome structure for all four species (PDF), we see that these expectations are met. When comparing chromosome structure, humans are most similar to chimpanzees, less so to gorillas, and even less so to orangutans, as expected. To illustrate this pattern with a specific example, let’s return to the major chromosome structural difference between humans and great apes, the fusion event that led to human chromosome 2. As we have discussed previously, the fused chromosome is present in humans but not chimpanzees, gorillas or orangutans – meaning that it occurred after the human lineage separated from the chimpanzee lineage. Closer examination of this region in gorillas and orangutans reveals an additional difference: a portion of this region is inverted in the human and chimpanzee genomes (outlined in red) when compared with the equivalent region in gorillas and orangutans (outlined in blue):
So, humans are the most similar to chimpanzees (the most regions line up) and less so to the other apes, as expected. The differences we see are also easily mapped on to the phylogeny provided by the other lines of evidence. Since the inversion event is common to both humans and chimpanzees (but not present in the other species), it likely occurred in the human/chimpanzee common ancestral population after it parted ways with the lineage leading to gorillas. The fusion event would occur later, on the lineage leading to humans (and as we have seen, be shared by other species more closely related to humans than great apes). As expected, the phylogeny predicted using chromosome structure data alone matches the phylogeny predicted from the other lines of evidence:
Summing up and looking ahead
As we discussed at the beginning of this series, a good theory (in the scientific sense) is one that is supported by multiple lines of evidence and readily makes accurate predictions. With the advent of modern comparative genomics, evolutionary theory has shown itself to be robust in ways that Darwin could not have imagined. We can say with confidence that we share ancestors with other species, and that this conclusion is not (at all) likely to change, even as new information comes in.
In the next post in this series, we will turn our attention to features that do not fit neatly into predicted phylogenies. Far from being a problem (as frequently claimed by those opposed to evolution), these features are a wealth of information that reveal even more about our past.
About the author
