t f p g+ YouTube icon

Understanding Evolution: Is There “Junk” in Your Genome? Part 1

Bookmark and Share

December 30, 2011 Tags: Genetics
Understanding Evolution: Is There “Junk” in Your Genome? Part 1

Today's entry was written by Dennis Venema. You can read more about what we believe here.

One of the challenges for discussing evolution within evangelical Christian circles is that there is widespread confusion about how evolution actually works. In this (intermittent) series, I discuss aspects of evolution that are commonly misunderstood in the Christian community. In this first of several posts on “junk DNA”, we explore how genomics can be employed to test for non-functional sequences by comparing sequences between related organisms. As you finish reading the essay, see if you can figure out the meaning of the figure above. We'll pose a question at the end.

Do genomes have non-functional sequences?

There are various ways to test the hypothesis that certain regions of DNA are non-functional, and in this series we will explore some of them. One way to estimate the fraction of non-functional DNA in a particular genome is to determine which portions of the genome can be freely altered by mutation without consequence to the organism. DNA sequences that cannot be mutated freely without a loss in function are said to be under purifying selection: as mutated forms of this sequence arise in a population, the loss of function associated with the mutated sequence reduces the likelihood that the organism will pass this mutation on to future generations. This type of mutation, in a functional sequence, has deleterious consequences. Another way to put it is that functional sequences are subject to natural selection, which acts as a filter to “purify” the genome at a particular location, but that non-functional sequences are free from the constraints of selection, and “anything goes” with respect to mutation.

Tell us again, Grandpa!

One way to think about this is to consider a humorous story that is told within an extended family (I think every family has these types of stories – I know my kids love to hear certain ones told and retold again). Certain incidental details of the story can be altered from telling to telling, and perhaps Uncle Joe tells it a certain way but Uncle Jeff tells it another with respect to those types of details. There are, however, certain features of the story that are absolutely non-negotiable, or the story doesn’t “work” (and telling these parts incorrectly will generate protests and corrections from the kids who know how the story goes and insist that you are not telling it correctly). These types of stories, like genomes, have some bits that can freely change and others that can’t. The bits that can’t change are under constraint and, in biological terms, subject to selection. The same factors apply in more concise form to jokes: some bits can change (and do, as the joke is told and retold) – but some bits cannot (for example, the punch line).

The best way to test for purifying selection is to compare the genomes of related organisms that have been separate species for some time. (To continue our analogy, you could determine what parts of the story are really important by comparing how each of the uncles tells it and listing out the parts that are the same in all the various versions). The genomes in the two species are modified versions of the same genome present in the common ancestor species: they started as virtually identical but have since experienced mutations in different locations over time. Mutations in functional sequences will have been subject to purifying selection to remove loss-of-function mutations, whereas mutations will have freely accumulated in non-functional sequences. The two genomes are thus a collection of similarities and differences, as we have discussed before:

In some ways, comparing the DNA sequence between related organisms is like reading alternative history novels. The hypothesis of common ancestry between similar organisms makes a very straightforward prediction about their genomes: it simply predicts that they were once the same genome, in the same ancestral species. This hypothesis also predicts that these two genomes, having gone their separate ways in the diverged species, will have accumulated changes once they separated. Like an alternative history, each genome has the same backstory, and then a history independent from the other after the point of separation.

These similarities and differences, however, will not be randomly distributed. Sequences subject to purifying selection will have fewer differences than sequences that can freely mutate. Accordingly, when compared side-by-side, the two genomes should have regions where differences are common, and where differences are rare. For example, consider a genome segment in two related species where there is one gene present. This gene has some regions that cannot be changed without significant consequences (the DNA letters that code for the amino acid sequence of the gene product, for example) and some regions that can be mutated without consequence (such as some sequences inside introns, the non-coding segments that separate gene coding segments and are spliced out of the final gene product):

What biologists observe when comparing sequences like this between two related organisms is that coding sequences, which obviously are required for the gene’s function, have far fewer differences between them than do sequences found in introns or in between genes. The idea is not that mutations are preferentially happening in those areas, but that mutations can occur everywhere in the genome, but are more likely to be selected out of populations if they alter functional sequences.

The expanding data set

This type of analysis gets easier to do the longer two species have been separated, and the more species one has to compare to each other. Very recently separated species will have a very high degree of genetic similarity simply because neither species has had appreciable mutations to a common ancestral genome. As such it is difficult to pick out the sequences that have been subject to selection, since functional and non-functional sequences are both still highly similar (virtually identical). It is only as species have been separated for a long time that a pattern begins to emerge: sequences that are functional remain “constrained” by purifying selection to remain more similar, and non-functional sequences accumulate mutations in the separate lineages that make them less and less alike.

Now that biologists have access to a wide range of mammalian genomes, this type of analysis has been done on the human genome with ever-increasing precision. Early studies comparing the human genome to other genomes, such as the mouse genome (compared in 2002) and dog genome (2005), suggested that only a small fraction of the human genome was subject to purifying selection (about 5%). Recent work published a few months ago has taken this approach to a whole new level: a genome-wide comparison of 29 mammalian species (!). These results are exciting from a biological perspective because this work helps scientists tease out what bits of the human genome are under selection, and what bits aren’t (which isn’t always obvious, because we don’t always know what sequences are functional or non-functional). This type of approach is non-biased: it requires no prior hypotheses of what types of sequences to look for, but rather simply looks for what has been selected to remain more similar over time. The results, based on the (very nearly) whole-genome sequences of 29 placental mammals, are in keeping with previous estimates: about 5-6% of the human genome is under purifying selection, and the rest appears to be rather free to accumulate changes. As a species, our genome seems to be about 95% incidental details and 5% punch line.

So, what sorts of things lurk out there in the “other” 95%? In the next post in this series, we’ll head out into the wilds of the human genome and have a look.

Editors Note: So now that you've read the essay, see if you can surmise the meaning of the figure at the top. This is a tiny stretch of DNA, 21 bases (units of code) long. Why do you think position #4 shows only an A and position #5 shows only a G, whereas other positions are not restricted in this manner? Pretend that you could represent the genome as a whole in this manner. Of the 3 billion bases in our genome, how many of them would be configured like position #4 or #5? What about the rest? Is the specific base (unit of code) functionally important for that set? Upon what, do you base your conclusions.? Finally do the presuppositions of the Intelligent Design Movement and Reasons to Believe pivot on how to interpret this data? How would such proponents interpret the data differently than mainstream biologists? Feel free to address these questions in the comment section or, if you prefer, just reflect on them.

For further reading:

Lindblad-Toh, K., et al. (2011). A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478 (27), 476-482.

Dennis Venema is professor of biology at Trinity Western University in Langley, British Columbia. He holds a B.Sc. (with Honors) from the University of British Columbia (1996), and received his Ph.D. from the University of British Columbia in 2003. His research is focused on the genetics of pattern formation and signaling using the common fruit fly Drosophila melanogaster as a model organism. Dennis is a gifted thinker and writer on matters of science and faith, but also an award-winning biology teacher—he won the 2008 College Biology Teaching Award from the National Association of Biology Teachers. He and his family enjoy numerous outdoor activities that the Canadian Pacific coast region has to offer. Dennis writes regularly for the BioLogos Forum about the biological evidence for evolution.

Next post in series >

Learn More

View the archived discussion of this post

This article is now closed for new comments. The archived comments are shown below.

Page 1 of 1   1
Jon Garvey - #66869

December 31st 2011


I was interested in this paper (http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002379) demonstrating the presence of 60 odd de novo genes in humans wrt the other primates. The paper seems to suggest that all of these were formed from sequences in non-coding DNA (rather than, say, old disabled genes etc).

Analogically, this would seem to make the case for those parts of the genome, at least, being the equivalent of loose building materials lying to hand. On a teleological model, one could argue that one would expect to find such a supply of materials on a building site. On an ateological model, their presence was fortuitously exploited by random mutation/natural selection.

In either case, prior to their final mutation, they were merely non-functional sequences. Would they therefore have warranted the description of “junk”?

Darrel Falk - #66872

December 31st 2011


It’s interesting to compare the changes that led to “functionality.”  (See Data Set 4 in the paper.)  In most cases, the human gene and the chimpanzee “non-gene” differ at only one location.  Note also that it’s not clear  whether the products of these new genes are doing anything of significance. They do, however, provide the raw material upon which natural selection can work. This, of course, is evolution in action: Non-functional DNA gaining function, which gets honed as the generations go by.

Jon Garvey - #66876

December 31st 2011

“Non-functional DNA gaining function, which gets honed as the generations go by.”

Or junk getting recycled?

Ashe - #66882

December 31st 2011

Perhaps I’m misunderstanding what you mean by location (or perhaps I’m misunderstanding the data)  but I’m seeing that the  human gene and the chimpanzee “non-gene” differ at more than one location.

Darrel Falk - #66885

January 1st 2012

You are right Ashe…my mistake.  There is only one highlighted, but typically there are about two or three other differences between the chimpanzee and human.

HornSpiel - #66871

December 31st 2011

Apparently the size of letter is calculated on the basis of how non-random a base appears to be at that position (probably using some some standard deviation function).  The largest letters occur where only one base is found at that position. The shortest stacks would thus occur in a position where all four letters are represented in nearly the same proportion—25%.

From an evolutionary perspective
This display suggests
there are degrees of functionality, but I suspect the reality is more complex. What is most interesting is not the single large letters (e.g. 4 and 5), which are obviously functional, but the midsize stacks—1, 21, 6, and 14. I would guess that at least some are functional, but correlated with other midsize stacks so that certain combinations are more optimal or functional that others.

It is actually then the midsize stacks that would allow for variations
between individuals in the species and allow for adaptions to changing
environmental conditions because those positions are both functional and

Another interesting combination are the large stacks with a small amout of variation. For example 3 and 7.  These might might be non-functional, but reflect a functionality that was only recently inactivated (in evolutionary terms).

Even the shortest stacks though could be functional if they are part of functional combinations that happen to result in the bases at that position  being represented in approximately equal proportions.

From an ID perspective

You would have to come up with a different explanation for large stacks that are non-functional. The explanation of recently turned off functional bases that are now no longer subject to purifying selection is is not consistent with ID predictions.  ID migth suggest that the designer is maintaining that position at an optimal level of variability/purity.

Another ID prediction that would differ is when certain functional combinations appear to be more optimal than others. ID would have to  predict in that case that the design was for optimal variability.

The front-loading hypothesis would predict that certain variability was designed in and preserved from the beginning until the right time came for it to be expressed. This could happen if the variability was preserved in some alternate functional configuration and then repurposed at just the right moment.

It seems to me that these differing predictions could be tested in some way to determine which perspective is more reflective of the dynamics of genome change.

Darrel Falk - #66873

December 31st 2011

This is a very nice analysis, HS.  The DNA sequence represented in the figure is not quite typical of the genome as a whole.  Usually we have a situation in which the base at a particular spot (or stretch of bases) is important and it is conserved through evolutionary time, or else it is not important, in which case it is free to change into any one of the other three bases.  As Dennis says 95% of the genome (about 2,850,000,000 bases) fits into the latter category.

Still, although rare in the genome as a whole, there are thousands of little stretches like the one depicted in the figure.  I wonder what stretches like that are doing.  Why is it for example that base # 4 is almost always an A in evolutionary time, but some of the other positions nearby have greater (but not complete) flexibility?

I like your analysis of the ID implications.
Jon Garvey - #66877

December 31st 2011

It seems to me that these differing predictions could be tested in some
way to determine which perspective is more reflective of the dynamics of
genome change.

Sounds like a useful piece of work for someone, Hornspiel (though probably best for ID and frontloaders to make their own predictions). I imagine that the adaptive mutation guys like Shapiro would also have some particular expectations on these questions.

Biochemistry would seem unlikely to have much bearing on it - can stochastic mutation and NS carry the weight, both in terms of the varying frequencies and the times available?

melanogaster - #66880

December 31st 2011

Dear Dr. Garvey,

I don’t understand your personalization of these matters. Why would they make their own predictions? Shouldn’t they formulate clear hypotheses that make unequivocal predictions instead?

It seems to me that you are doing your darndest to help them avoid doing any real science. Is this correct?
Jon Garvey - #66883

January 1st 2012


Nope, I don’t think so. Hornspiel’s suggestion that different foundational approaches would make different predictions seems a good one, but I was simply suggesting that it would be better for the predictions to come from them, rather than him.

For example, I’m not sure ID folk would agree that optimal design would be a necessary criterion. I’m not sure that frontloaders would insist that preserved sequences would necessarily be functional throughout their history.

All I’m suggesting is that the process must be “We hypotheisize; we predict” rather than “you, of course, would hupothesise; you would predict.” Otherwise the danger is of disproving straw men. Fair enough?

Ashe - #66886

January 1st 2012

Yeah, like Kirk Cameron’s “crockoduck”. 

Here’s some thoughts on “junk” from a front-loader:

Jon Garvey - #66887

January 1st 2012

Interesting piece, Ashe, as are the links. And here’s a quote about introns from an adaptive mutation proponent, James Shapiro:

If we are looking for an evolutionary reason to explain the subdivision of protein-coding sequences into exons and introns, one good candidate would be the increased facility this kind of split organisation (and the accompanying RNA processing apparatus) provides for the rapid generation of genomic innovations. In other words, as evolution proceeds, so does evolvability.

He goes on to point out the significance of the fact that “the most abundant DNA in the largest genomes is precisely the raw masterial for the creative rearrangements that generate and then amplify novel sequence elements.” In that sentence he seems to be predicting exactly the process apparently occurring in the paper on the 60  novel human genes I link to above.

All we need now is some ID input.

But I still ask if junk that gets used is still junk? After all, many men have a garage full of stuff they call “spares” and their wives call “junk”. It all depends if you think the man’s an idiot or a genius.

beaglelady - #66889

January 1st 2012

But I still ask if junk that gets used is still junk? After all, many
men have a garage full of stuff they call “spares” and their wives call
“junk”. It all depends if you think the man’s an idiot or a genius.

At some point we would call a junk collector whose collection gets out of hand a hoarder, a person with a psychological problem. 

On the other hand, there is the inspiring story called “The Boy who Harnessed the Wind”:

It’s the story William Kamkwamba, a Malawi boy who grows up in extreme poverty,  to the point that his family almost starves to death.  Forced to drop out of school at a young age, and barely understanding English, William manages get his hands on an old English textbook about electricity.  He ingeniously cobbles together a  working windmill built entirely from junk. It works well enough to provide electricity for his home (and nearly burns the place down.)
He’s now studying in the USA.   It’s one of the most inspiring books I’ve ever read.    
Due to his poverty and lack of resources, it’s understandable why William built his windmill the way his did.  (btw, the homes in his village now have solar panels.)


Jon Garvey - #66893

January 2nd 2012

Sounds a great story Beaglelady. If one were to use it anologically, I guess it would represent the cell lumbered with useless introns but managing to make use of it anyway.

I actually had a patient once from that unfortunate minority of obsessive-compulsive hoarders: you had to push the door open past boxes of junk to a scene of utter squalor. It was the whole bit - tunnels through ancient hoarded newspapers, rotting food in corners, avalanches of old clothing on the stairs. In his case, “junk” led to a completely dysfunctional life: it was a symptom of disease.

If the same man had kept on turning out useful inventions made of scrap, you’d regard him as eccentric but organised - the house full not of junk, but a collection of raw materials.

beaglelady - #66897

January 2nd 2012

The point is that William Kamkwamba used discarded junk to build his windmill because he had no other choice. Being impoverished, he couldn’t afford the materials for a more efficient design.

For pictures of his windmill:

melanogaster - #66915

January 3rd 2012

Dr. Garvey, you’re either misunderstanding me or you’re not being fair at all, much less enough.

I am advocating neither of those positions. I am advocating “We hypothesize, our hypothesis predicts” because if a hypothesis doesn’t make clear predictions, it’s worse than useless. 

Would you disagree with my actual position?
Jon Garvey - #66916

January 3rd 2012

Mr Drosophila

I don’t have a position on your actual position. I had assumed that Hornspiel’s comment was shorthand for “hypothesis leads to prediction” and was replying to him in that charitable spirit.

I don’t find fault with his phraseology at all. Do you?

Jon Garvey - #66895

January 2nd 2012

In reply to Hornspiel I will try and suggest a possible ID viewpoint from my reading of their stuff. The existence of functions for some introns might imply, for ID, the existence of functions in the rest, much as my analogy to Beaglelady might suggest that products emerging from the husband’s shed show it is full of stored materials, not junk.

But that would not be conclusive, as in ID theory one cannot identify (as opposed to suspect) design without identifying specified information, which can by definition only be decided on the basis of identifying actual function.

So in the paper I cited in my first post here, though the chimp non-genes are just a few bases distant from the de novo human genes, the former (by definition) contain zero specified information, and the latter 100%, even if the putative designer had been gradually building the sequence of a functional gene.

This would seem a methodological weakness, but it is one shared with conventional evolutionary science, which by dismissing teleology a priori would also be unable to recognise “potential” function because evolution has no potentials, only actualities. There is no mechanism for forward planning in Neo-darwinism.

Teleological theories like Shapiro’s would, I assume, be no more able to identify non-functional “almost genes”, but might be looking to see if the patterns of mutation in such introns were qualitatively or quantitatively different from that of other non-coding regions. If they were, it could be evidence that these regions of the genome, though not currently functional as genes or controls, are designated as development areas for such function. ID people would also, I guess, find such findings interesting.

Would Neo-darwinian workers be able to find alternative non-teleological explanations for any such “special” mutation pattern in introns? I suspect they would, since even if cell-based or designer-based teleology were at work, it would necessarily have some physico-chemical antecedents which could be invoked as “causes”. The question would be whether the underlying assumption (no forward planning permitted) became implausible from the weight of evidence.

HornSpiel - #66898

January 2nd 2012


I appreciate your contribution. I of course agree that the predictions of ID must come from ID proponents. Your comments highlight one of the realities of ID—that there is no unified ID theory. There may be ID hypotheses out there embraced by various individuals/communities but each will make different predictions based on their viewpoint.

I trust we can all agree that a theory is more than a hypothesis. It is a framework for making predictions that have been tested by real word results. That’s what evolutionary theory is. It may have it’s weaknesses, but they do not appear to me to be existential by any measure.

On the other hand ID is not a theory, or even a unified hypothesis. That is one reason it is so hard to refute. I would certainly appreciate some qualified ID community, such as the Discovery Institute, planting a stake in the ground and start making  some testable predictions.

That said, I am not holding my breath.

James R - #66904

January 2nd 2012


I have some references pertinent to your comment to Jon above (66898).  They may or may not be helpful to you, but I thought you would want to know about them.

You wrote:

“I trust we can all agree that a theory is more than a hypothesis. It is a framework for making predictions that have been tested by real word results. That’s what evolutionary theory is.”
It is claimed by many (not just ID proponents, YECs and OECs) that the neo-Darwinian understanding of evolution has been tested by “real world results” and found seriously wanting in many respects.  Some of the biggest past and present names in the field of evolutionary biology, including Gould and Margulis among the deceased, and Shapiro and some of the Altenberg group today, have made such claims.  If you want a recent formulation from a non-ID viewpoint, you can look at Shapiro’s new book, Evolution (Shapiro is a big-gun molecular biologist from U. of Chicago); for a critique from an ID viewpoint, you can check out the list of purportedly failed Darwinian predictions at:


I have not yet read the material on the Darwin’s predictions site, so I am not recommending it, but merely providing the URL for information, since the author of the site agrees with your understanding of theories and hypotheses and testing.

Regarding positive ID predictions, you will find a list of them in Stephen Meyer’s Signature in the Cell, 481-497.

Hope this helps.

Ashe - #66899

January 2nd 2012

I doubt that RTB/ID’ers would want anything to do with potentialities or propensities (I remember one of them ranting against it in a post about the evolution of eyes). 

Jon Garvey - #66905

January 2nd 2012

Seems like nobody likes them except me, then, which is a shame - it’s propensities that make the world go round.

Enosh - #67129

January 15th 2012

As a ‘shoot-from-the-hip’ reflection from an ID (YEC) perspective, I wonder why we must come to the conclusion of non-functionality in these sequences that look free to mutate. If ‘non-functionality’ is meant in a different sense from the ordinary use of the term, then we should be made aware of it because the notion of ‘non-functional’ DNA or ‘junk DNA’ to the layman’s ears sounds like we could simply do away with 95% of our genomes without any harm to us. Surely even evolutionists today would acknowledge that this is not the case.

Anyway, there are other types of design other than just coded information. What if some of those less constrained sequences have a function not heavily tied to sequence order? For instance, they may be structural framework, in which case the order of the bricks in a wall is generally not as important (especially when you’ve only got four different types of bricks) as the sequence of letters in a sentence. There may be designed mutational hotspots (such as we see with somatic hypermutation function in antibodies for varying antigen recognition in adaptive immunity), perhaps with a designed ability for variation. There may also be designed redundancy in the genome which might allow for some of the genome to be disabled by mutation and yet remain practically as functional. What is lost in such instances is not functionality but a backup path. The physcial structure of the DNA in the cellular context may also affect where and what types of mutations accumulate, in which case those creatures that are more phenotyically similar may be more likely to mutate in similar ways than those not so similar. As such, mutation preference would not be a strictly random occurrence. Anyway, these are a few possibilities that could be explored in an ID framework.

Page 1 of 1   1