Evolution Basics: Genomes as Ancient Texts, Part 1
Note: This series of posts is intended as a basic introduction to the science of evolution for non-specialists. You can see the introduction to this series here. In this post we compare genomes to manuscripts, and predict genome patterns that would be expected if speciation events produced related species groups over time.
You will recall from the last few posts in this series that speciation begins when a barrier to allele flow arises that separates what was once a freely-interbreeding population into two populations. Once the two populations cannot exchange alleles, (or do so only at a reduced rate) differences accumulate in the two populations that are not averaged between them. Additionally, we have seen how the founder effect can give this process a head start, since it can lead to differences right at the point of separation. Once two populations have separated and ceased to interbreed, new alleles that arise in either population will not spread to the other, which leads to differences accruing over time.
The point to emphasize for our purposes is this: when speciation starts, it starts from a common ancestral population. This means that the two populations, at the start, have (nearly) identical genomes. Indeed, the only differences at this early stage will be the few genes that have different allele frequencies due to imperfect sampling when the two populations separated. From this starting point, the two populations will begin to slowly accumulate differences – but these differences will be minuscule. The overall pattern will predominantly be one of identical sequences, with only a small number of differences.
Genomes as ancient texts: an analogy
Perhaps an analogy would be helpful here. Prior to the invention of the printing press, manuscripts in the ancient world were copied by scribes. Though a good scribe could be counted on to provide a highly accurate copy, small copying errors were inevitable. These changes, however, would not be so large as to render the copies unrecognizable – the vast majority of the text would be correct. Once a copy was made (with the small errors in contained), it would often serve as the starting point for further copies. If so, the errors would be copied in the process, since the next scribe would also try to copy the manuscript as faithfully as possible (though he too might introduce new errors of his own).
We can similarly think of a genome as a “text” that is passed on through copying with the possibility of copying errors. As with all analogies, however, there are some important differences. While human scribes interact with the meaning of the text they are copying, the genome “scribes” – enzymes that copy DNA sequences based on pairing monomers – do not. This means that while errors made by human scribes tend preserve a meaning of some kind (even if it is an altered meaning), DNA replicating enzymes do not check to see if meaning (i.e. function) is preserved as they copy. (The functional check for a DNA sequence will come later as that particular organism develops (or not) and reproduces (or not). In other words, natural selection is the check for “meaning” for a DNA sequence).
To carry the analogy further, we could consider an organism’s genome to be like a book, with chapters, paragraphs and sentences. For a genome, the “chapters” might be the sequences of whole chromosomes; paragraphs would correspond to genes; and sentences the sub-components of genes. We could also consider printing runs of a book to be like a replication event. For example, consider two independent printings of the same manuscript. They would, of course, be almost identical – but suppose the two printings had specific typos that did not greatly alter the meaning of the text, and were thus missed by the editors: the 1st printing on page 14, and the 2nd printing on page 23:
Now suppose that the original manuscript is lost, and the 3rd printing of the book is typeset from a copy of the 1st printing. This new printing would have the typo on page 14, and any new typos that happened to creep in (say, one on page 8):
Imagine that over the course of this book’s history, there are five known printings arranged as follows:
Now imagine that a previously unknown, 6th printing is found. This printing has the exact same characteristic typos on pages 14 and 3 found in the 1st and 4th printings, as well as a unique typo not previously seen before in any printing, on page 5.
There are, of course, several possible explanations for the provenance of the 6th printing, with one explanation being more likely than the others. In increasing order of likelihood, some options are:
- The “6th printing” is in fact a separately authored book that is not a copy of the original manuscript of which the 5 printings are copies.
- The 6th printing is a direct copy of the original manuscript, but the editor happened to independently make two of the exact same errors found in other manuscripts as well as a new error on page 5.
- The 6th printing is a direct copy of the 1st printing, but the editor happened to independently make one of the exact same errors found in another manuscript as well as a new error on page 5.
- The 6th printing is a direct copy of the 4th printing, but the editor happened to make a new error on page 5 in the copying process.
It should go without saying that option #1 would not be seriously considered by literature scholars, given the nearly identical text shared between the newfound printing and the other known printings. Option #2 would require that two rare events (typos) happen independently, twice over, in two printings. As such, it is less likely than option #3, which requires only one rare event to happen twice over independently. Option #4 is of course the best option, because it does not require that any rare events happen twice in independent copies. In this scenario, the reason the 6th printing has the features it does is because it is an (imperfect) copy of the 4th printing:
In scientific terms, this option is the most likely, or parsimonious one: it offers an explanation for the provenance of the 6th printing with the fewest number of low-probability events.
Back to biology
Now consider speciation events in light of our analogy. At the point of separation, the two populations have nearly identical “books” (i.e. genomes). As lineages go their separate ways, “typos” (i.e. mutations) can occur in genes that are then passed down to the descendants of that lineage, just as we have seen for typos accumulating in copied texts:
If indeed speciation events produced Species A – D from a common ancestral population, we would expect their genomes to exhibit certain features when compared to each other. First and foremost, their overall genome sequence and structure should be highly similar to each other – they should be versions of the same book, with chapters and paragraphs of shared text in the same order. Secondly, the differences between them would be expected to fall into a pattern. Species C and D, for example, would be expected to share some features as the result of sharing a common ancestor (Species A) more recently than they do with Species B. In this simple diagram, for example, Species C and D would have an identical mutation in gene 1, and the most parsimonious explanation for it would be that they both inherited it from a common ancestor (Species A). This would be much more likely than both species having the same mutation occur independently at the exact same location in both genomes.
Put more generally, the hypothesis of common ancestry makes specific predictions about the pattern one should observe when examining genomes. In tomorrow’s post, we’ll see how well those predictions hold up when examining actual genomics data for a proposed group of related species.