Evolution Basics: Species Trees, Gene Trees and Incomplete Lineage Sorting

Bookmark and Share

July 19, 2013 Tags: Genetics

Today's entry was written by Dennis Venema. You can read more about what we believe here.

Evolution Basics: Species Trees, Gene Trees and Incomplete Lineage Sorting

Note: This series of posts is intended as a basic introduction to the science of evolution for non-specialists. You can see the introduction to this series here. In this post we discuss how the distribution of some alleles in related species is expected not to match the overall species family tree.

In the last few posts in this series, we’ve examined the overall pattern we see when comparing related genomes to one another, and how multiple data sets neatly fit into the same family tree, or phylogeny. In this post, we’ll move on to a deeper understanding of phylogenies, and how it is actually expected that some features of genomes will be at odds with their family trees.

But first, a brief aside: this is a challenging topic, and one that might be confusing at first. Still, if you’ve come this far in this series, you already have the tools you need to understand what’s going on here, and with a little additional effort, you’ll have an even deeper understanding of related genomes than you did before. If, on the other hand, this particular topic remains a bit of a muddle, don’t worry – the rest of the series will not depend on understanding this finer point. Also, be sure to ask questions in the comments if things are unclear.

Species trees

Let’s return to what is by now a familiar example of a phylogeny: that of humans, chimpanzees, and gorillas:

Phylogenies are also known as “species trees”, since “tree” is another name for phylogeny. A species tree shows us the overall pattern – which species share a common ancestral population more recently, and which share a common ancestral population more distantly in the past. In other words, as we noted in the last post in this series, a phylogeny is a measure of shared history and separate history for any two species. The longer two species have a common history, the more similar they are expected to be, on average. Humans and chimps, for example, continue to share a common history for several million years after the lineage leading to gorillas separates from the (human / chimpanzee) common ancestral population. This shared history is what on average, makes the chimpanzee and human genomes more similar to each other than either is to the gorilla genome.  Individual genes (and their alleles) may have a different history within species as they separate from one another. For this type of analysis, we need to examine phylogenies for individual genes – so called “gene trees.”

Gene trees

If you think back to previous posts (here and here) on how variation (alleles) arise through mutation, it should be fairly intuitive that the same principles that can be used to group species into a phylogeny can also be used to group alleles of a single gene into a phylogeny. For example, consider the DNA sequence of three alleles of the same gene, which we can represent as the “yellow”, “red” and “blue” alleles (the colored boxes). Sequence differences that make these alleles distinct are highlighted in red text:

Using the same principles that we used for species as a whole, we can explain the origin of these three alleles by two mutation events (starting with a given that the yellow allele is the ancestral state):

So, within a population, we can reconstruct the allele history of an individual gene using the same methods we have previously applied to species as a whole.

Speciation with genetic variation along for the ride (or not)

So, mutation is constantly producing new alleles (variation) within populations, and processes such as natural selection and genetic drift work to either increase or decrease the frequency of alleles in populations over time. Also, we have spent considerable time discussing (here, here, here and here) how speciation events occur, starting with populations that separate from one another, and accrue differences over time that may lead to the formation of distinct species. All that remains is to bring these ideas together: to consider what might happen to variation (alleles) within a population as it goes through a speciation event. To do that, let’s track our hypothetical alleles through the speciation events that led to humans, chimpanzees and gorillas.

This species tree has the following populations: the population that is ancestral to all three species, designated “(H,G,C)” for “(Human, Gorilla, Chimpanzee)”; the population ancestral to both humans and chimps (H,C) and the lineages (populations) that lead to the present day species after their last speciation event with the species on the phylogeny (H), (G) and (C):

It’s important to keep in mind that a single line on the phylogeny is in fact a population, and populations can have genetic variation. Let’s place our three alleles into the (H,G,C) population:

Now we are set to explore possibilities for how these alleles will be inherited (or not) through the speciation events that will occur. One possibility is straightforward – all three alleles will be inherited by all three species. This possibility is called “complete lineage sorting” since it represents a perfect segregation of all alleles into all lineages. This requires that all three alleles be present in the subpopulations that divide into separate lineages, and that no alleles be lost over time in any lineage. While this is certainly possible, it is by no means certain. As we have seen, when populations separate it is unlikely that all alleles in the original population will be represented in both subpopulations after the divide. Also, it is possible that selection or genetic drift may cause alleles to be lost over time in one lineage but not another. Anything other than perfect segregation of all alleles into all lineages is called “incomplete lineage sorting” – and for a large genome, it is a given that at least some genes will exhibit this effect.

Incomplete lineage sorting – a worked example

The first challenge to complete lineage sorting that these three alleles will face is the speciation event that separates the (H,C) and (G) lineages. For the purposes of this example, let’s suppose that the red allele is excluded from the population that forms the (H,C) lineage, but that all three alleles persist in the (G) lineage. You will recall that this is an example of the founder effect – a nonrandom sampling that can exclude alleles from a new subpopulation by chance:

Now let’s examine one possible scenario following on from the (H,C) / (G) speciation event. In the (G) lineage, the yellow and blue alleles are lost over time. At the (H) / (C) speciation event, both the blue and yellow alleles segregate into both lineages, but in the (C) lineage, the yellow allele is later lost. Similarly, the blue allele is later lost in the (H) lineage:

For this particular gene, then, we have the following final pattern:

And at last we see the issue: the gene tree for these alleles is at odds with the species tree. Recall that in the gene tree, the red and blue alleles are more closely related to each other than they are to the yellow allele:

In the species tree, however, the two closest relatives (chimpanzees and humans) do not have the two most closely related alleles – they have more distantly related alleles.

Now that we have worked this example, hopefully the reason behind the discrepancy is clear – there is no guarantee that alleles will sort in a lineage to match up with the overall species pattern. If a gene has variation in a population undergoing speciation events, it is expected that some of the time it will assort with a pattern that does not match the species pattern – in some cases, it will have a gene tree that is “discordant” with the species tree. For a population with thousands of genes with multiple alleles present, it is a given that some alleles will assort into a discordant pattern. Far from being a problem for evolution, discordant trees are predicted by evolution. It would be a problem if we did not observe them – but in fact we do, and as we shall see next time, we observe them in precisely the pattern that matches what we would expect based on species trees.

In the next post in this series, we’ll discuss how discordant gene trees can be used to determine another feature of interest to scientists – population sizes for the lineages on a phylogeny.


Dennis Venema is professor of biology at Trinity Western University in Langley, British Columbia. He holds a B.Sc. (with Honors) from the University of British Columbia (1996), and received his Ph.D. from the University of British Columbia in 2003. His research is focused on the genetics of pattern formation and signaling using the common fruit fly Drosophila melanogaster as a model organism. Dennis is a gifted thinker and writer on matters of science and faith, but also an award-winning biology teacher—he won the 2008 College Biology Teaching Award from the National Association of Biology Teachers. He and his family enjoy numerous outdoor activities that the Canadian Pacific coast region has to offer. Dennis writes regularly for the BioLogos Forum about the biological evidence for evolution.

< Previous post in series Next post in series >

View the archived discussion of this post

This article is now closed for new comments. The archived comments are shown below.

Page 1 of 1   1
Sean Purcell - #81934

July 19th 2013

Dear Venema,

Firstly, I wish to say I find your posts on the evidence for common ancestry to be incredibly informative and persuavive. I especially enjoy the vitellogenin psuedogene example since not only are the three psuedogenes in our genome, the Vit 1 psuedogene is situated exactly between two functional genes just like in the genome of the chicken - a brilliant confirmed prediction. 

Secondly, in your post “Is There “Junk” in Your Genome? Part 4” you wrote this:

“Yes, the implications of unitary pseudogenes such as these are easy for even non-specialists to grasp: whales have the defective remnants of genes adapted to terrestrial vision and air-based smelling because they descend from terrestrial ancestors.”

I was hoping if you could direct me to the papers where you got this information. I would find it baffling if the antievolution organizations could come up with a reason why whales would need these genes in the past.

PNG - #81948

July 20th 2013

Never underestimate the capacity for creative handwaving of the ideologically committed.

I’m not sure what refs Dennis was using, but here are some. 

98 Olfactory receptors in aquatic and terrestrial vertebrates  http://link.springer.com/article/10.1007/s003590050287

08 The vestigial olfactory receptor subgenome of odontocete whales.  http://sysbio.oxfordjournals.org/content/57/4/574.full

07 The olfactory receptor gene repertoires in secondary-adapted marine vertebrates: evidence for reduction of the functional proportions in cetaceans  http://rsbl.royalsocietypublishing.org/content/3/4/428.abstract?ijkey=762dfb26eb64e4f06a5567308dd88a6f685e3b67&keytype2=tf_ipsecsha

03 Genetic evidence for the ancestral loss of short-wavelength-sensitive cone pigments in mysticete and odontocete cetaceans. http://www.ncbi.nlm.nih.gov/pubmed/12713740

13 Rod monochromacy and the coevolution of cetacean retinal opsins. http://www.ncbi.nlm.nih.gov/pubmed/23637615

All but the first article are freely available.

Sean Carroll discusses loss of the cetacean SWS opsin in his book The Making of the Fittest, along with many other pseudogenes.

Sean Purcell - #81954

July 21st 2013

Thank you very much for this list of papers.

Dennis Venema - #81958

July 21st 2013

Thanks PNG!  References 03 and 08 were the ones I was using, if memory serves.

Sean Purcell - #82100

July 25th 2013


Today I was looking up Francis Collins’ position of junk DNA. When I googled it, I found that almost every single link of the first couple of pages is from websites run by antievolution organizations asserting that Collins has abandoned or is in the process of abandoning the belief that genomes contain junk DNA. They quote him from his latest book, but I am suspicious. Is this the case?

Dennis Venema - #82181

July 29th 2013

Hi Sean,

I’m not sure if I found the same links you saw, but those that I did find are discussing Collins’ newer book on the subject of non-coding DNA (things like regulatory DNA, and so on). It’s not at all controversial that regions of non-coding DNA have function - but it’s not accurate to say (as many antievolutionary sites imply or even claim outright) that this means that there is no such thing as “junk” DNA. If you define junk DNA as sequences that do not contribute to the organism’s fitness, there is very good evidence that most of our genome qualifies.

Nick Gotts - #81977

July 21st 2013

Excellent explanation of a tricky concept! I found it very hard to grasp when I first came across it.

Page 1 of 1   1