In this series, we explore the genetic evidence that indicates humans became a separate species as a substantial population, rather than descending uniquely from an ancestral pair.
In the last post in this series, we introduced the concept of incomplete lineage sorting using the distribution of words with alternative spellings in Canadian English, American English, and British English. As we saw, the variation in spelling words with “-our” and “-or” was present in the common ancestral population of all three groups. Despite sharing a more recent common ancestral population with Americans in general, Canadians use the “-our”spellings along with their British cousins in contrast to the American “-or” versions:
This type of pattern, as we noted, is also found in the DNA of related organisms that separate from each other over a relatively short timescale (in this context, within a few million years). Since the lineages leading to humans and other great apes went their separate ways within the last few million years, we expect that this pattern should apply to a portion of the human genome. While we should match our closest relative (chimpanzee) most often, we also expect that the human genome will match the gorilla genome more closely for some of our DNA sequences:
The reason that we expect to match with chimpanzees most often is straightforward: we shared a common ancestral population with them for a longer period than we did with gorillas, which branched off earlier. We do, however, expect to match with gorillas a portion of the time – just as Canadians and Brits match on occasion despite Canadians sharing a longer common history with Americans. And as we saw for alternative spellings, adiscordant DNA pattern – one that bucks the overall trend – lets us know that the DNA variants in question were both present in the common ancestral population of all three species (in this case, humans, chimpanzees, and gorillas) as well as in the common ancestral population of the two most closely-related species (in this case, humans and chimpanzees).
One difference between words and DNA sequences is that while an individual might be inconsistent with their spelling – such as a Canadian using American forms when writing for a predominantly American audience – it is not possible for an organism to be inconsistent with the DNA variants they possess. Humans, like all other mammals, have two copies of their DNA sequences – one they inherit from their mothers, the other from their fathers. As such, if we can measure how much DNA variation a population had, we can estimate its population size. More importantly in this case, if we can measure the proportion of DNA sequences in the discordant pattern for three species, we can use that information to infer the ancestral population size of the two more closely related species.
Coalescent theory 101
Note: unfortunately, there is no easy way to explain how this is done without delving a bit into the theory and math behind it. If this gets too complicated, feel free to skip this section and scroll down to the “doing the math” section below.
One way to think about DNA variation in populations is to think of it as time goes forward. In this case, DNA variants arise through mutation and diverge from one another within a population:
Another, equally valid way to think about it is to imagine time going backward – in this case, variant DNA sequences become the same, or coalesce as we go back in time:
The probability of any two sequences coalescing as we move back in time is the probability that they share the same ancestral sequence – that they are descended from the same original sequence. The probability that they are the descendent of any randomly-chosen ancestral sequence depends on the number of sequences present in the population – in other words, the size of the population.
Let’s return to the example we looked at before:
In this case, reading from right to left (i.e. travelling back in time) we would say that the human and gorilla variants (in yellow) we see in the present day coalesce with each other before they coalesce with the blue variant we see in chimpanzees (the yellow one having been lost). If you trace the yellow variants, they connect with each other sooner than either does with the blue variant. Because the yellow and blue variants do not coalesce with one another until we reach the (human, chimpanzee, gorilla) common ancestral population, there are several possible patterns that might result once such a population undergoes speciation into three descendent species. One possible pattern is the one in the diagram: the human and gorilla sequences coalesce first, followed by coalescence with chimpanzee (what we would call a (HG)C pattern). Another possibility, not shown, would be the gorilla and chimpanzee sequences coalescing first, followed by the human sequence (a (GC)H pattern). The final possibility, also not shown, would be the one that is usually produced – humans and chimpanzees coalesce first, followed by gorilla (the usual (HC)G pattern). Two of the three possible patterns, then, are discordant - (HG)C and (GC)H – and one is not discordant: (HC)G. So, 2/3 of the possibilities produce a discordant pattern, and 1/3 produce the “correct” pattern that matches the overall species pattern.
Notice that one requirement for a discordant pattern to result is that the two DNA variants in the most closely related species (humans and chimpanzees) cannot coalesce in the common ancestral population of those species. The probability that any two sequences will not coalesce in a population over time t is as follows:
In this equation, t = time in years that the common ancestral population in question persists, N = the population size, and g = the generation time in years. (N is multiplied by 2 because every individual carries two DNA sequences.)
From this formula, we can derive the probability that we will see a discordant pattern for any given DNA sequence in three related species. We require that the variants not coalesce, and we have shown that if they do not coalesce, 2 of the 3 outcomes are discordant. So, the probability of observing a discordant pattern is as follows:
Doing the math
With this equation in hand, we are now ready to apply it to what we actually observe in the human, chimpanzee, and gorilla genomes. Using paleontological and genetic data, the time span that humans and chimpanzees shared a common ancestral population after the gorilla lineage branched off is estimated to be 2 million years (t). Using an estimate of generation time of 20 years for all species throughout this process (a value consistent with that observed in present-day great apes) leaves us only with two factors outstanding: the population size of the (human chimpanzee) common ancestral population, and the proportion of DNA sequences we would predict to be in a discordant pattern. Prior to full-genome sequencing of gorillas that allowed for determining the latter, it was predicted, based on genetic evidence, that this population numbered around 50,000 individuals. This value, when plugged into the equation, predicts that 25% of the time we expect a discordant pattern when examining human, chimpanzee, and gorilla sequences. The actual result is about 30%, suggesting a population of about 62,000 individuals. This result adds further support to the prior evidence that the common ancestral population of humans and chimpanzees was large – far larger than even the bottleneck to ~10,000 on the human lineage after the separation from the chimpanzee lineage.
The publication of the complete orangutan genome in 2011 allowed for the same calculations to be made for the (human, chimpanzee, gorilla) ancestral population, since the orangutan lineage branches off the primate tree prior to the (human, chimpanzee, gorilla) speciation events. Once again, predictions were made based on prior population size estimates, and once again the estimates closely matched the observed values – the common ancestral population of (humans, chimpanzees, and gorillas) was of similar size to the (human, chimpanzee) common ancestral population.
Implications for human origins
While this quantitative data is not as easy to appreciate as other sorts of evidence we have examined, it nonetheless is compelling: humans, chimpanzees, and gorillas have, within their genomes, the exact pattern of incomplete lineage sorting predicted by (a) relatedness as evidenced by all other lines of genomic evidence (such as shared mutations in individual genes) and (b) large ancestral population sizes throughout the speciation process. The smallest population our lineage was reduced to in the last 15 million years or more, then, was to the bottleneck of 10,000 individuals once our lineage parted ways with the chimpanzee lineage.
In this next post in this series, we’ll expand our discussion to include recent objections by some Christians to these lines of evidence – objections that argue for the credibility of still holding to a recent single pair who are the genetic ancestors of all humans.