Throughout this article, we have discussed how evolution is a population-level phenomenon, where average characteristics shift incrementally over time. As we have seen, new DNA variants that arise from mutation events are the physical basis for this process. These mutations occur in one individual and then may spread to become more common in a population – perhaps even completely replacing all other variants of that sequence within a population. Just as a language may slowly change to a new spelling of a word, so too a population may shift from one DNA variant to another over time.
In this process, however, populations may maintain two (or more) variants for a long period of time. Again, language change over time is a useful analogy here. If we consider modern English as an example, there exist some variant spellings that are currently seen as “correct” within their local setting. Where an American would use forms such as “harbor” or “neighbor”, those from the United Kingdom, Canadians and others would use slightly different spellings: “harbour" and “neighbour" (and even as I write this, my American-made word processor balks at those versions). For modern English as a whole, then, we maintain some acceptable variation in spelling for certain words – we have not yet universally “decided” on one correct spelling for these words, and perhaps we never will.
If we were to compare British English, American English, and Canadian English as a whole, Canadian and American English are more similar to each other, because they (in general) share a more recent common population of speakers than either does with British English (due mostly to more extensive mixing between Americans and Canadians since the Americas were colonized). This can easily be demonstrated– Canadians and Americans consistently use words and phrases that are more alike than either is to British English. As any American or Canadian home mechanic will tell you, reading a British automotive repair manual can be an exercise in confusion. Whereas Canadians and Americans use the same terminology, our cousins across the Atlantic use different terms for many car parts:
Other examples are easily found: no one in Canada says “aluminium” as British speakers do, opting rather for the American form “aluminum”. One of my favorite (favourite?) examples comes from N.T. Wright: if you hear someone say “I’m mad about my flat” it makes all the difference in the world if they are North American or British. For those of us in North America, it means we’re upset about a tire puncture – for Brits, it means one is enthusiastic about one’s living arrangements. American and Canadian English consistently group together, since they are closer relatives to each other than either is to British English. In biological terms, Canadians and Americans share a more recent common ancestral population.
Yet, as any Canadian knows, despite our near-complete affinity with American English, there are few spellings that buck the overall trend. The –our vs. –or forms are a good example:
In this case, we see closer affinity between Canadian and British English – even though, on the whole, Canadian and American English are closer relatives. How is this possible? The answer is straightforward – although most of Canadian and American English shares a more recent common source than either does with British English,some Canadian words share a more recent common source with British words.
In the case of the “-or” vs. “-our” spellings, these were acceptably variable in the United Kingdom prior to the colonization of North America: while both forms were used, it seems that the “-or” versions were less common. When English speakers colonized North America, they brought both forms along for the ride. Examples of “-our” spellings can be found in early American texts. For example, honour makes an appearance in an early draft of the Declaration of Independence, though the final form has honor. In general, in the United States, the “-our” forms were lost, and in Canada, the “-or” forms were lost. We can represent these events on a phylogeny leading to the present-day forms, as if these languages were species, and the words within them were DNA variation:
As we can see, this process resulted in Canadian and British English matching more closely to the exclusion of American English for these specific word spellings, even though, as a whole, American and Canadian English are closer relatives. The reason for the pattern is simple – not every variant spelling from the original ancestral population (pre-colonization Britain) sorted down to every descendant population.
This effect is called incomplete lineage sorting, and it works in exactly the same way in biological populations as it does in linguistic ones. Instead of variant spellings, DNA variants present within an ancestral population may not sort completely down to every descendant species due to losses along the way:
Here, as before, we see that the variants present in Species 1 and 3 match more closely than either does with Species 2, even though Species 1 and 2 share a more recent common ancestral population. In order to find such DNA variants that buck the overall relatedness trend, we need to be confident that we have determined the true pattern of relatedness for the species we are examining. In practice, this is not difficult – we simply look at the overall pattern of relatedness between species at the DNA level. Species 1 and 2 in the diagram above will share DNA similarities to the exclusion of Species 3 most of the time in a nested hierarchy (as we saw in a previous post). The overall pattern will be consistent, but a few variants will produce a pattern that does not match the species pattern – a pattern said to be discordant with the pattern of overall species relatedness. As we have seen with languages, these discordant variants are useful for investigating the features of the populations that gave rise to them. Look again at the species diagram above. The fact that we see incomplete lineage sorting for DNA variants “A” and “a” tells us (1) that both variants were present in the common ancestral population of all three species, and (2) that both variants were present in the common ancestral population of Species 1 and 2. Knowing that these populations had these variants present within them allows us to estimate their level of DNA variation generally, which in turn allows us to estimate their population size.
Next, we will apply this technique to the genomes of humans, chimpanzees, and gorillas – and discuss how it sheds light on the population size of our lineage as we separated from other great apes on the long road to becoming human.