For the last year or so, I’ve been working systematically through the science of evolution in my ongoing Evolution Basics series – without comment on how antievolutionary groups attempt to cast doubt on it. While I’ve certainly addressed antievolutionary arguments directly in the past, this last year has been a welcome break from that pattern.
Recently, however, I’ve had a few friends point out a recent piece from the Discovery Institute taking direct aim at one of my papers – and attempting to call human common ancestry into question. Would I be responding, they wondered?
To rebut, or not to rebut? That is the question.
Of course, I don’t respond to every objection to my work out there on the internet – doing so would prevent me from focusing on what I want to write (such as that Basics series) and cause me to get even less sleep than I currently do.
Sometimes, however, I do feel the need to respond. I began to weigh the pros and cons by asking myself some questions:
Are the arguments ones that turn on common misconceptions that would be worth explaining for the benefit of a broader audience?
Claiming that humans and chimps are less than 95% identical at the genome level is a frequent (and rather curious) argument from those who reject human common descent. Additionally, the arguments around pseudogenes showed the usual features. Would those genuinely interested in learning about evolution benefit from a careful explanation of why these common objections don’t hold water? Here the answer seemed to me to be “yes.”
Is this an opportunity to discuss some interesting science that I otherwise might not delve into in the near future?
Other aspects of Luskin’s piece deal with subjects that are seldomly discussed in the anti-common descent ID literature: specifically, (a) the observation that humans and chimpanzees not only have similar genes, but have them in essentially the same order along their chromosomes (i.e. they have conserved synteny), and (b) the observation that human and chimpanzee genes use the same DNA code where multiple options exist, meaning that they are much more identical than they need to be for functional equivalence (the argument fromredundancy). These two lines of evidence for common ancestry (synteny and redundancy) are ones that I discussed at length in my 2010 paper, and to date Luskin’s reply is the only significant attempt (such as it is) to rebut them that I have seen. In order to understand why Luskin’s objections are unfounded, we’ll have to get into some fairly detailed cell biology that’s quite interesting in its own right – and thus another reason to take the time to work it through.
So, all that to say that Basics will go on a brief hiatus, we’ll spend some time working through Luskin’s arguments, and hopefully learn some interesting science along the way. First, we’ll tackle Luskin’s objection to the fact that humans and chimpanzees have nearly identical genomes by discussing exactly how whole-genome comparisons are done.
Will the real human – chimp genome identity value please stand up?
Luskin’s first point is to claim that I overstate the level of identity between the human and chimpanzee genomes. Typically this value is given as either 95% percent identical, or 98.6% identical, depending on how the calculations are performed. Luskin claims that such estimates are overestimates, and cites a few ID articles that attempt to lower the value to under 95% – some significantly so, even to the 70% range – by claiming that large portions of the human and chimpanzee genomes “do not align with one another”.
(As an aside, the precise value is actually of no real importance. Chimpanzees happen to be our closest living relatives, but had other hominin lineages persisted to the present day we would be discussing them instead, with their genomes that are even more identical to ours than chimpanzees. Conversely, if the lineage leading to chimpanzees and bonobos had gone extinct, we would be discussing gorillas as our closest relatives, and so on. In other words, the precise value is not really the issue – but perhaps a value of “95% identical” – or worse, “99% identical” – is uncomfortable, and perceived as altogether too suggestive of common ancestry. After all, why would “separate designs” – i.e. “independently created organisms” – need to have such similar genomes?)
So, just how identical is our genome to the chimpanzee genome? It seems like an easy question, but in reality the answer is complicated. Understanding the details requires us to discuss some important features of genomes, and evaluate how the differences are calculated.
When comparing genomes, the most straightforward comparisons are between regions with only small changes. Since the principles at play here are exactly the same when comparing texts, We can illustrate them using the opening sentences of Dickens’ A Tale of Two Cities, which will make for more interesting reading than strings of A’s, C’, G’s and T’s. In this first example, we have Dickens’ original 227 characters on the left, and a single letter “mutation” on the right:
The two sequences thus have 226 out of 227 characters in common, and are thus 99.56% identical to each other. For the human and chimpanzee genomes, 2.4 billion of our approximately 3 billion DNA “letters” line up in this way, with only 1.23% difference between them, found in this sort of “single letter” differences. This measure is what leads to the commonly-cited 98.6% identity value for the human and chimpanzee genomes.
But, you say, what about the other 0.6 – 0.8 billion DNA base pairs? That’s about 20% of our genome or more! If it doesn’t line up with the chimpanzee genome, doesn’t that indicate that Luskin is correct in suggesting that our genomes cannot be more than 80% identical?
Well, no. The assumption that it can’t align does not hold up to scrutiny. Part of the answer lies in understanding how mutations other than single-letter substitutions can change genomes. For example, duplications and deletions of many letters at once can occur, leading to this sort of situation (again, illustrated with text):
Now we have an entire phrase duplicated. The original text has 227 characters, as before, but now the “mutated” text has 297 characters – meaning the two versions are now only 76.4% identical. Note how this measurement “counts” every character in the duplicated section as a independent mutation, even though they are all the result of a single mutation event (the duplication). Put another way, both examples shown here have the same number of mutation events – a single letter change, or a single duplication event – but the results alter different amounts of text, leading to differences in raw identity values.
Since these sorts of differences arise from either insertions of duplicated material, or a deletion from one genome but not the other, they are collectively called “indel” mutations, or simply “indels.” When indels are accounted for when comparing the human and chimpanzee genomes, the identity value drops to about 95%, leading to the second value commonly given for the level of identity observed between the two genomes. It’s important to note that including indels disproportionately exaggerates the importance of differences caused by mutation events that remove or duplicate large amounts of DNA at once. Biologically speaking, it’s better to measure differences between genomes based on the number of independent mutation events – i.e. to treat both single-letter changes and indels as single mutation events. In practice, biologists prefer to use just the single-letter changes, since this percent difference is proportional to the number of mutation events.
Indel mutations also produce another headache when comparing genomes, and this has to do with how genomes are sequenced and assembled. The way we sequence genomes is by cutting multiple copies of them up into random short fragments, sequencing them all, and then using a computer to find areas of overlap between the fragments to “build” a reconstituted genome. If you have duplicated sequences in the mix, it can be challenging to find where in the genome they fit in. Let’s illustrate this with the same text as before, since even the original paragraph has some repetitive “sequence”:
If we were to break multiple copies of the original paragraph into random short fragments of a few words each, we could in principle reassemble the entire piece from overlapping segments in the fragments. Where we would run into problems, however, would be with short fragments that are repeated. For example, if we had a fragment that read “it was the” we could not be sure where to place it, since it could match any one of nine locations. The only way to resolve this is to find larger fragments, such as “it was the season of” – which now matches one of only two locations. Better still would be “it was the season of Darkness” which aligns uniquely to only one location.
So, let’s return to those 0.6 – 0.8 billion DNA base pairs that “don’t align” between the human and chimpanzee genomes. In fact, they are mostly sequences that we cannot align uniquely to a single location in either genome because they are repetitive sequences. As my young-earth creationist colleague Todd Wood puts it, “the "unaligned" chimp DNA is not too different; it's too similar.” In that link, Todd also takes the time to actually investigate the chimpanzee DNA we have yet to align uniquely to the human genome. First off, there’s now only about 0.16 billion DNA base pairs in this category – because we’ve done more sequencing since the original 2005 study, and that extra sequencing has allowed us to locate where a lot of that “unaligned” DNA actually aligns. Moreover, within that 0.16 billion, Todd shows that it most of it certainly aligns with high identity (over 97%), but we haven’t yet figured out where it uniquely aligns – it’s repetitive DNA that aligns with at least two locations, and we haven’t yet sequenced a fragment long enough to place it unambiguously. And as Todd points out, if we cannot align sequences uniquely, we don’t “count” them in the overall alignment, even though the sequences are highly identical to each other. Todd estimates the actual amount of chimpanzee sequence that genuinely has no counterpart in human sequences, and finds it to be less than 1% of the length of our genome. While I didn’t have time to repeat Todd’s entire analysis, I did repeat a portion of it and found my results to be consistent with his. This is something that I’d encourage Luskin (or anyone else who is interested) to do, since all of the relevant data is freely available online.
With this understanding of how genomes are sequenced and compared, we can now see how Luskin’s attempts to minimize the identity between the human and chimpanzee genomes are not realistic. They hinge on the inappropriate assumption that the sequence we cannot uniquely align is completely different – when in fact the great majority of it is highly identical, but repetitive, sequence.
In the next segment, we’ll tackle some of Luskin’s remaining arguments against human common ancestry.