Genomes as copied texts: tying it all together
Over the last several posts in this series, we have been examining the overall patterns we observe when comparing genomes of distinct species with each other. What we have seen is that the pattern we observe is entirely consistent with species sharing common ancestors – and their genomes, accordingly, as modified copies of what was once the genome of their shared ancestral population. What we have not yet discussed in detail, however, is that the lines of evidence we have examined – comparisons of genome structure, functional gene sequences, and specific mutations in inactivated genes – all cohere into a mutually supportive pattern. The cohesion of multiple, independent lines of evidence is exactly what a good theory is built on, and comparative genomics has provided very strong support to evolutionary theory.
Starting with shared “typos”
Let’s return to the example of inactivated olfactory receptor genes that we discussed in our last post. Based on this small subset of defective genes alone, we constructed the following “family tree” for humans, chimpanzees, gorillas and orangutans:
A proposed family tree for related species - or phylogeny, to introduce the technical term - is a graphical way to both (a) represent a large data set and (b) propose a hypothesis for how that data set came to be. In this data set, we have two categories of features to explain – mutations that are identical in more than one species, and mutations that are unique to a single species. And as we discussed previously, the above phylogeny fits the data very neatly – the patterns of the shared events and the unique events are supported by the same phylogeny. Shared events occur once, in a common genome, and unique events occur after two species have gone their separate ways.
What in fact this phylogeny is proposing is that these four species have differing amounts of shared history and separate history. For example, humans and chimpanzees would have the largest amount of shared history of these four species (outlined in yellow), and comparatively smaller separate histories (outlined in blue and red):
Humans and gorillas, however, have less shared history (and more separate history) over the same time period:
And lastly, orangutans and humans have even less shared history compared to the other primates, and more separate history:
So, from this relatively small data set (a handful of shared mutations in a few genes) we have a detailed hypothesis of which species share the most history in common – a hypothesis we can test using other lines of evidence.
From typos to sentences
Now that we have used a small subset of the “shared typos” found in these four genomes to assemble a proposed phylogeny, we can consider what this phylogeny would predict when comparing the sequences of individual genes in these four species. The key is the shared history portion of a phylogeny for any two species – during this portion, what will later become two species is only one population, with a common genome. As such, they will have the same sequence for any given gene (unless there is variation within the population for that gene, in which case the population will share a common pool of alleles of that gene). The longer two species have a shared history, the more similar we expect their gene sequences to be. The longer they have had a separate history, the more different we expect their genes to be, due to mutation events occurring in the “separate history” portion of the phylogeny.
With the sequencing of the orangutan genome in 2011 and the gorilla genome in 2012, we now can assess this prediction using a very large data set for all four species. Human and chimpanzee sequences are nearly identical on average (98.6% identical); humans and gorillas slightly less so (98.3% identical); and humans and orangutans even less so (96.6% identical). These results fall into the predicted pattern:
In other words, using large swaths of genome sequences to assemble a phylogeny for these species produces the same phylogeny that the shared mutation data does. The pattern required by the shared mutation data (a tiny subset of the DNA sequences of these species) is also the same pattern that best explains the overall genome identity we observe.
From sentences to chapters
With shared mutation data and overall sequence data supporting the same phylogeny for these species, we can go on to further test this hypothesis using genome structure – the spatial organization of genes on chromosomes, or “chapters” to return to our copied book analogy. As for the shared mutation data, the similarities and differences we observe are expected to fall into either shared features (in the predicted pattern) or unique features (that arose once species had separated from other species). Overall, when comparing chromosome structure for all four species (PDF), we see that these expectations are met. When comparing chromosome structure, humans are most similar to chimpanzees, less so to gorillas, and even less so to orangutans, as expected. To illustrate this pattern with a specific example, let’s return to the major chromosome structural difference between humans and great apes, the fusion event that led to human chromosome 2. As we have discussed previously, the fused chromosome is present in humans but not chimpanzees, gorillas or orangutans - meaning that it occurred after the human lineage separated from the chimpanzee lineage. Closer examination of this region in gorillas and orangutans reveals an additional difference: a portion of this region is inverted in the human and chimpanzee genomes (outlined in red) when compared with the equivalent region in gorillas and orangutans (outlined in blue):
So, humans are the most similar to chimpanzees (the most regions line up) and less so to the other apes, as expected. The differences we see are also easily mapped on to the phylogeny provided by the other lines of evidence. Since the inversion event is common to both humans and chimpanzees (but not present in the other species), it likely occurred in the human/chimpanzee common ancestral population after it parted ways with the lineage leading to gorillas. The fusion event would occur later, on the lineage leading to humans (and as we have seen, be shared by other species more closely related to humans than great apes). As expected, the phylogeny predicted using chromosome structure data alone matches the phylogeny predicted from the other lines of evidence:
Summing up and looking ahead
As we discussed at the beginning of this series, a good theory (in the scientific sense) is one that is supported by multiple lines of evidence and readily makes accurate predictions. With the advent of modern comparative genomics, evolutionary theory has shown itself to be robust in ways that Darwin could not have imagined. We can say with confidence that we share ancestors with other species, and that this conclusion is not (at all) likely to change, even as new information comes in.
In the next post in this series, we will turn our attention to features that do not fit neatly into predicted phylogenies. Far from being a problem (as frequently claimed by those opposed to evolution), these features are a wealth of information that reveal even more about our past.