Speciation and Incomplete Lineage Sorting
One of the challenges for discussing evolution within evangelical Christian circles is that there is widespread confusion about how evolution actually works. In this (intermittent) series, I discuss aspects of evolution that are commonly misunderstood in the Christian community. In the two previous posts, we examined how speciation is something that happens to populations. In this post, we explore why individual gene histories may not match species histories as populations diverge, and look at how these results have been misinterpreted by some members of the ID movement.
Populations and genetic diversity
One consequence of speciation being a population event is that populations have genetic diversity – not all members of the population are genetically identical. For any particular gene, then, a population may have several slightly different forms present within it. These different forms are called alleles. An example in humans that is fairly well-known is the different alleles that control blood types: one allele gives rise to the A type, another to the B type, and a third allele the O type. Individuals may be either blood type A (either two A alleles or A + O); blood type B (either two B alleles or B + O); type AB (one A allele + one B allele) or type O (two O alleles). Any one individual can have only two alleles of this gene (one from mom, the other from dad), but as a population we collectively maintain all three. Other human genes have many more alleles than three (for example, some genes of the immune system have hundreds of alleles) despite the fact that any given individual can have at most two. The larger a population is, the more alleles of a given gene it can maintain. Smaller populations are more at risk of losing alleles due to chance (something called genetic drift).
Genetic diversity and speciation
The fact that populations maintain genetic diversity is important to remember when considering speciation. Speciation events are commonly represented with branching tree diagrams (“phylogenies”, or “species trees”) such as this one:
Here we see that Species 1 and Species 2 are more closely related to each other than they are to Species 3. What this says is that Species 1 and Species 2 shared a common ancestral population more recently with each other than either did with Species 3. So far, so good – but what this doesn’t mean, however, is that comparing gene sequences between these species will always group 1 & 2 together as more similar to each other than to 3. While this will be true most of the time, it is expected that some of the time this pattern will not hold. The reason is due to something called incomplete lineage sorting, and it has to do with the fact that populations going their separate ways carry genetic diversity with them. Let’s try to explain what is going on here.
Imagine that the ancestral population of all three species (the 1,2,3 common ancestor) has four alleles of a certain gene (represented by different colors in the diagram). These alleles originally arose due to a single mutational difference during DNA copying. Once there is a difference in place, two alleles can go on to acquire other differences over time, again, through copying errors. As a result, alleles can be compared to each other, just like species. Alleles that are recently separated will have more similarities in common, and alleles that have been separate for longer will have acquired more differences. In this example, the blue and green alleles are more similar to each other than either is to red or orange, and vice versa. The blue and green alleles arose from a common ancestral allele, and the red and orange alleles arose from a common ancestral allele. Further back in time, these two ancestral alleles themselves arose from one common starting allele. All four alleles will have a great deal in common (nucleotide sequences inherited from the single ancestral allele), as well as differences (for example, the red and orange alleles will share all changes that occurred between the time they split off from the blue/green lineage and when they themselves separated into two distinct alleles).
Now consider the time when the (1,2,3 common ancestor) population divides to become the (1,2 common ancestor) species and the Species 3 ancestor (the first branch in the diagram). As this population divides into two species, it is not guaranteed that all four alleles will be present in the founding population of each new species, simply by chance. Each founding population is a sample of the original population, but any given sample may omit certain alleles:
In the example above, we see that the red allele has been lost from the (1,2 common ancestor) species, and that the Species 3 ancestor has lost the blue and orange alleles. What this means is that the founding population of the (1,2 common ancestor) species didn’t have any individuals that carried the red allele, and that the Species 3 ancestor founding population didn’t have any individuals that had the blue or orange alleles. Both events happened simply by chance, because the founding populations are not representative samples of the original population.
Later, as the (1,2 common ancestor) species separates again into Species 1 and Species 2, the same issues arise. The two founding populations may not transmit all of the genetic diversity of the (1,2 common ancestor) population:
In this case, the founding population leading to Species 1 did not include a member with the green allele, and the founding population leading to Species 2 did not include any members with either blue or orange alleles. Also, the green allele has been lost in the lineage leading to Species 3 (it became rare and was eventually not passed on due to chance).
In the present day, examining the alleles of the three modern species will reveal different levels of similarity. The blue allele is now only found in Species 1, and it is most similar to the green allele in Species 2, and less similar to the red allele in Species 3. This pattern matches the overall “species tree” pattern for these three species:
The orange allele in Species 1, however, tells a different story: it is most similar to the red allele in Species 3, and less similar to the green and blue alleles. If we knew only about the orange allele in Species 1, we might conclude that Species 1 and Species 3 are the closest relatives. This is because the “gene tree” for these alleles places orange closest with red, even though the true “species tree” reveals an overall pattern of speciation that is different:
The orange allele thus has a gene phylogeny that is said to be “discordant” with the overall species phylogeny.
How do biologists assemble species trees if gene trees can be discordant?
It might seem from the above discussion that assembling a species phylogeny from gene phylogenies is a hopeless task: after all, if any individual gene tree might be misleading, how can we be certain we have the correct species tree?
The solution is to realize that while any individual gene tree might be discordant, gene trees that match the species tree will be the most common category. In our example above, Species 1 and Species 2 share a common ancestral population for some time after the (1,2 common ancestor) and the Species 3 common ancestor populations diverge. This means that any event that happens to this population (loss of an allele, for example) will be reflected in all descendant species (in our example, Species 1 and Species 2). This common history favors gene trees that match the species tree. For a discordant tree, the ancestral (1,2) population needs to maintain two alleles, and these alleles cannot sort equally into Species 1 and 2. This can happen, but it is less likely.
What this means in practice is that biologists expect a certain pattern of gene trees when comparing related organisms. Using our three species as an example, most gene trees should match the species tree. The less likely outcome is a gene tree where an allele from Species 1 is more similar to the allele in Species 3. We can be confident we have the correct species tree because the majority of the gene trees favor one species tree over the alternatives.
A problem for common descent?
The fact that gene phylogenies/trees and species phylogenies/trees don’t always match is not something that surprises scientists, since it is a well-known phenomenon and the mechanisms underlying it are understood: species arise from genetically diverse populations and that diversity does not always sort completely down to every descendant species. Discordant phylogenies, however, are commonly used among Christians as a means to cast doubt on to common ancestry and/or evolutionary biology as a whole. One example from the Intelligent Design movement will serve as an illustration. In a blog post discussing discordant trees found when comparing the human genome to that of other primates, Casey Luskin argues
Since humans are typically said to be most closely related to chimps, this data conflicts with the standard supposed tree … the basic problem is that one gene (or portion of the genome) gives you one version of the tree, while another gene (or portion of the genome) gives you a very different version of the tree. This leads to discrepancies between molecule-based trees, wherein DNA data fails to provide a consistent picture of common ancestry.
In the end, molecular trees are based upon the sheer assumption that the degree of genetic similarity reflects the degree of evolutionary relatedness … Clearly this assumption fails when different genes paint contradictory pictures of evolutionary relationships.
As we have seen, these differences are the natural, expected consequence of genetic diversity from an ancestral population sorting itself incompletely into different descendant species. The data set Casey is concerned about is primate evolution, where the species tree for humans, chimpanzees, gorillas and orangutans is as follows:
In the article linked above, Casey is discussing a recent comparison of the newly-completed orangutan genome with the human genome. The availability of the orangutan genome allowed researchers to scan the human genome for locations where humans are more similar to orangutans than to chimps. These regions are rare in the human genome, and very short in length. Indeed, the researchers found a pattern: chromosome segments in humans most often match chimpanzees, and do so for thousands of nucleotide base pairs at a time, on average. Those regions that match orangutans are tiny (on average less than 100 base pairs) and rare. This is exactly what one expects from the species tree: humans and chimps are much more likely to have gene trees in common, since they more recently shared a common ancestral population (around 4-5 million years ago). Humans and orangutans, on the other hand, haven’t shared a common ancestral population in about 10 million years or more, meaning that it is much less likely for any given human allele to more closely match an orangutan allele. It is certainly possible, however, and in scanning over the entire genome rare sites that have this pattern can be found. Indeed, the authors of the paper above used previously-determined speciation times and population size estimates to predict what fraction of the human genome would be expected to match more closely with orangutans. Based on these parameters obtained in other studies, they predicted 0.9% of the human genome would have a human : orangutan gene tree. Their observed value was 0.8% - a result that provides additional support for the population size estimates and speciation times from other studies.
Why is this data interesting?
Aside from its misinterpretation by the ID movement, this sort of data actually provides us with information about the population size of the species that went on to give rise to orangutans, gorillas, chimpanzees and humans, as well as times for the various speciation events. I have discussed similar data for the (gorilla/chimpanzee/human) and (chimpanzee/human) common ancestor populations elsewhere; this new data merely confirms previous estimates of the population sizes of the various ancestral groups, and extends back to the (orangutan/gorilla/chimpanzee/human) common ancestor population with greater precision. As before, these results continue to strongly support the hypothesis that the human lineage has never been as low as two individuals at any point in our evolutionary history. Indeed, these new results confirm that the human : chimp common ancestor population was large (about 50,000 members). As Darrel Falk and I have discussed here on BioLogos in the past, all methods used to date (numerous approaches, all using independent assumptions) would have to be wildly wrong (by several orders of magnitude) if indeed our species arose from just two individuals.