In this series, we explore the genetic evidence that indicates humans became a separate species as a substantial population, rather than descending uniquely from an ancestral pair.
In the last post, we began to explore the arguments of Dr. Vern Poythress in his recent book Did Adam Exist? – specifically, his argument claiming that the true level of identity between the human and chimpanzee genome is on the order of 70%, rather than 96-99%. The first source that Poythress cites in support of his claim is a 2002 study comparing human and chimpanzee sequences:
“The 96 percent figure deals only with DNA regions for which an alignment or partially matching sequence can be found. It turns out that not all the regions of human DNA align with chimp DNA. A technical article in 2002 reported that 28 percent of the total DNA had to be excluded because of alignment problems, and that “for 7% of the chimpanzee sequences, no region with similarity could be detected in the human genome.””
There are several problems here that would not, unfortunately, be apparent to Poythress’s intended audience. These problems, however, are immediately apparent to a geneticist. The first issue is that this paper, published as it was in 2002, cannot be a study comparing the entire human genome with the entire chimpanzee genome. In 2002, only a preliminary draft of the human genome was available (an improved version would be released in 2004). Moreover, chimpanzee genomics was in its infancy in 2002. Consulting the paper itself reveals that the sample size was a tiny fraction of the chimpanzee genome. The first line of the paper’s abstract indicates that the analysis was restricted to a small amount of DNA:
A total of 8,859 DNA sequences encompassing approximately 1.9 million base pairs of the chimpanzee genome were sequenced and compared to corresponding human DNA sequences.
Given that the human and chimpanzee genomes are about 3 billion base pairs long, this paper is describing the results for comparing approximately 0.06% of the two genomes. This 0.06% sample was drawn out of a larger sample of about 3 million base pairs, as the authors describe:
Twenty-eight percent of the total amount of sequence was excluded from the analysis, since the entire sequence, or parts of it, displayed more than one match in the human genome that was not due to known families of repeated sequences. For 7% of the chimpanzee sequences, no region with similarity could be detected in the human genome.
This is the section that Poythress is discussing when he states that “28 percent of the total DNA had to be excluded because of alignment problems”. First, note that the “total DNA” here is the 3 million DNA base pairs of the sample under study, not the total 3 billion DNA base pairs of the entire chimpanzee genome. Second, note that the reason for excluding the sequence from the analysis is not because it does not match the human genome, but because it matches it in more than one location. Genomes have a lot of repetitive DNA, and this is part of the challenge when comparing genomes.
Perhaps a short aside about how genomes are sequenced would be helpful here. An analogy I have used before is to imagine a genome as a long text. The way scientists “read” a text (genome) is by chopping it up into fragments, reading the fragments, and then reconstructing the text by finding where the fragments overlap. This process works well until one encounters repeated text. For example, consider the opening lines of A Tale of Two Cities by Charles Dickens:
As I have written before on this topic, we can reassemble a text with repetitive sequence if we use fragments that are long enough:
If we were to break multiple copies of the original paragraph into random short fragments of a few words each, we could in principle reassemble the entire piece from overlapping segments in the fragments. Where we would run into problems, however, would be with short fragments that are repeated. For example, if we had a fragment that read “it was the” we could not be sure where to place it, since it could match any one of nine locations. The only way to resolve this is to find larger fragments, such as “it was the season of” – which now matches one of only two locations. Better still would be “it was the season of Darkness” which aligns uniquely to only one location.
What Poythress seems to be misunderstanding is that the reason for excluding 28% of the sequence in the 2002 study was because it was the genomic equivalent of an “it was the” sentence fragment. These chimpanzee sequence fragments match the human genome – and may even match it perfectly – but they match in many places. The 2002 study, as an early study, used very short DNA fragments for its analysis. The fact that 28% of those fragments matched more than one location in the human genome is not at all surprising, and is not at all an indication that 28% of the chimpanzee sample is completely unlike human DNA. And even if it were (which it is not) the sample in question is only about one thousandth of the size of the human and chimpanzee genomes – a mere 0.1% of the total genome size.
What then, of the 7% that the authors could not match to the human genome? Here Poythress might have found an argument, except that, once again, this is not 7% of the entire chimpanzee genome, but 7% of one thousandth of the chimpanzee genome. Moreover, since this analysis was performed in 2002 – thirteen years ago – it was done using a draft of the human genome that is far, far inferior to what we have in the present day. As such, it is highly likely that many of the excluded sequences do in fact have a match in the human genome, but failed to find a match in the 2002 draft.
Of course, the way to approach this question is to look at larger data sets, and use the most recent data available. When one does so, one finds that the human and chimpanzee genomes are indeed about 95% identical, genome wide – data that Poythress does not discuss, or even mention.
(As an aside, attempts to minimize the identity between the human and chimpanzee genomes are common among Christians who deny evolution. I have written extensively on this topic in the past with respect to the Discovery Institute and Reasons to Believe (PDF), for example – and interested readers will find a much more thorough discussion in those sources. Interestingly, my friend and colleague Todd Wood – a Young-earth creationist (YEC)– also has expended significant effort to combat these misunderstandings among those holding to anti-evolutionary views. He wrote a seminal paper in the creationist literature on the topic in 2006, and has also strongly critiqued Reasons to Believe on these issues. I have found Todd’s scholarship entirely trustworthy and a fascinating read, given his YEC views.)
As such, Poythress’s argument that human and chimpanzee DNA is only about 70% identical has not yet found scientific support. In the next post in this series, we’ll examine his second line of argument – that a large percentage of human DNA is a better match to other great apes rather than to chimpanzees. Here too we will see that Poythress fails to understand the relevant science, and that the evidence does not support his conclusions.