Previously, we drew an extended analogy between how species form and how languages change over time. Through this analogy we came to understand that species, like a group speaking a language, are continuous populations that shift their characteristics incrementally over time. From this analogy we concluded that “species” is a label of convenience for what is in fact a population undergoing continuous, incremental change – just as a “language” is not static but in constant, if gradual, flux. With this understanding we are ready to begin examining our genome with the correct ideas in mind. The question is not, then “when did our species begin?”, since that question is like asking when “English” began. What genetics can address, however, is how many individuals were in our ancestral population as it separated from other species and became a distinct lineage. There are a number of genetic approaches that allow for such estimates, and we will examine a few of them in this series. Importantly, all of the methods return very similar estimates – that our ancestral population has not dropped below about 10,000 individuals over the last several million years. Since the earliest-known anatomically modern humans are present in the fossil record at 200,000 years ago, this minimum population size spans the time during which we biologically “became human” as a population.
Genetic variation and population size
Each of the population-size methods we will examine base their calculations on the amount of genetic variation in present-day human populations. For any given section of DNA in our genome, any one person can have at most only two versions of it – one received from their mother, and the other from their father. A large population, however, can have many more versions than just two. The amount of genetic variation in a population then, is connected to the number of individuals in that population. At its most basic, every estimate of ancestral human population size uses present-day genetic variation to estimate how many ancestors are needed to transmit the observed level of variation to the present day. A large amount of present-day genetic variation requires more ancestors than does a small amount.
The advent of genome sequencing, as you might expect, has shed a great deal of light on how much genetic variation is present in modern human populations. One significant source of human genetic variation comes in the form of what are known as single nucleotide polymorphisms, or “SNPs” (pronounced “snips”). “Polymorphism” simply means “having many forms”. SNPs are single DNA letters that are variable among humans, and we have around 300,000 common SNPs in our genome of 3 billion DNA letters. In other words, the majority of our genomes are identical to each other, but a small number of DNA letter positions on our chromosomes are variable. Consider a short section of DNA sequence for six different individuals, with three variable positions:
For any one SNP position, there are a maximum of four possible versions (since there are four DNA letters). Once we consider a few SNPs linked together on the same chromosome, however, the number of possible combinations becomes very large. For example, for just the three SNPs shown above, there are 64 different possible combinations (4 x 4 x 4, or 43). Twenty SNPs, on the other hand, would have 420 possible combinations, more than the number of people on the planet. For the six individuals above, we can see that there are five different combinations present. The most likely explanation for these five variants is that they were inherited from five different ancestors, and that persons 5 and 6 inherited their identical combination from the same ancestor. There are other, less likely possibilities, however: some of the combinations might result from new mutations, or from mixing and matching between the different SNPs. For example, person 4 and persons 5 and 6 differ by only one letter: person 4 has an “a” for SNP 1 where persons 5 and 6 have a “t”. One possibility that we need to account for is that person 4 might be descended from the same ancestor as persons 5 and 6, but that a new mutation from t → a occurred at the SNP 1 location. Another possibility is that there was recombination, through a process called “crossing over”, that placed a “t” into this position in person 4. So, when using SNP variation to count the number of likely ancestors, we need to factor in mutation and recombination rates, both of which we can measure directly in humans. In practice, the effects of mutation are small on using SNPs to estimate ancestral population sizes, since the mutation rate in humans is very, very low. Direct measurements of the rate have been done by sequencing the entire genomes of parents and offspring, and on average there are only about 100 – 150 new mutations every time we copy our genome of three billion DNA letters. The effects of recombination can also be minimized by choosing SNPs that are linked closely together on the same chromosome. SNPs that are closely linked together recombine only rarely, since there is so little space for crossing over to occur between them. While scientists factor in mutation and recombination rates, in practice they are not a major issue for SNP-based methods.
In practice, population size estimates based on SNP variation is simply a matter of sequencing a large number of people from around the globe, cataloging them for various SNPs, and estimating how many ancestors they would need to have the SNP variation we see in the present day. As you might expect, different people groups have characteristic sets of SNP variants within them. This makes sense, of course, because we know that the various groups are more closely related to each other than across groups. Tallying up the number of ancestors using this method consistently returns a total minimum population size of about 10,000 individuals: approximately 8,000 ancestors are needed to explain SNP diversity in sub-Saharan Africa, and about 2,000 ancestors for everyone else. SNP diversity in humans is far too large to result from one ancestral couple at any time in the last 200,000 years – we descend from a population. These values are also in good agreement with older, cruder methods of estimating population size from other types of genetic variation, giving us increased confidence that they are reasonable.