ENCODE and “Junk DNA,” Part 2: Function: What’s in a Word?
On Monday, I introduced “function” as a particularly useful concept in biology, but also cautioned that—like all good concepts—it has “fuzzy” edges. Indeed, it has lots of similarly “fuzzy” peers in the language of biology: for example, asking a molecular biologist “What is a gene?” or asking an ecologist “What is a species?” is not advisable unless you have an hour or more to devote to the conversation. A discussion of biological “function” could generate a similar conversation.
For most biologists, something biological has function if to contributes to the characteristics of an organism in such a way as to favor its reproduction (usually by favoring its survival). Conversely, for a biologist to claim that some feature of an organism is non-functional, they are claiming that this feature does not contribute to or favor survival or reproduction. To return to our historically-interesting example from yesterday, the wild-type allele of the enzyme responsible for making purple pigment in pea flowers (the “P” allele) is functional since it has an observable affect on the characteristics of the organism that favors its reproduction (attracting pollinators, perhaps). When a mutation arose in this gene to prematurely terminate the synthesis of the protein enzyme, the recessive “p” allele resulted. Biologists would not hesitate to label this allele as a loss-of-function allele, because the function they have in mind is that of making purple pigment. The fact that this allele still produces an mRNA and even a partial protein product would not faze them in the least, since the known biological function of the gene has been disrupted. On the other hand, the ENCODE definition of “function” as “any detectable biological activity” presents things differently—by that standard we would not be able to discern any difference between these two alleles, despite the evidence that one is functional (in the sense above) and the other is not.
What this means is that the ENCODE definition of “function” is specific to a context: detecting (any) biochemical activity for a segment of DNA in the genome. As I mentioned in my first post, looking for biochemical activity is a useful and interesting undertaking, and the ENCODE project is impressive in its scope. What it does not do, however, is define “function” in the usual biological sense we have just discussed: that of meaningful contribution to survival and reproduction. In fact, biologists would expect that many DNA sequences that are non-functional in the traditional sense would be detected as “functional” using the ENCODE definition. One such example is that of transposon-derived sequences, which make up nearly 50% of the human genome.
Transposons, and ENCODE
We previously examined transposons in our series on “Junk DNA.” In brief, these are parasitic DNA sequences that serve to replicate themselves and spread within genomes. They have sequences that act to recruit host enzymes for making mRNA and a protein enzyme that acts to copy and/or move the transposon to a new chromosome location. These entities are veritable beehives of biochemical activity, but biologists consider them non-functional (with respect to their hosts) even if they are highly functional (with respect to the transposon). In many cases, however, transposon sequences in mammals are defective—they have picked up mutations such that they no longer make the enzyme they need for movement, or perhaps the mutation ruined one of the DNA sites the enzyme binds to. As before, these sequences are non-functional with respect to their mammalian host—they make no contribution to the host organism at all—and they are non-functional even to themselves (since the transposon cannot replicate any longer). Even such doubly non-functional sequences, however, will retain detectable biochemical activity. Host DNA-binding proteins will still bind to these sequences, mRNA may be produced, and even the transposon enzyme might be partially made as a non-functional protein. These biochemical activities may persist for thousands of generations before additional mutations silence them, so these sequences would still be identified as “functional” according the ENCODE criteria. Since almost half of the human genome is made up from such repetitive sequences, it’s not surprising that ENCODE found so much “function.” Yes, these sequences have detectable biochemical activity, but that’s not surprising at all, given what we know about transposons. Nor does such activity demonstrate that these sequences are functional in the more strict sense. Indeed, lines of evidence from comparative genomics strongly suggest they are not.
Consider the onion
One such line of evidence is that closely related species can vary widely in the amount of DNA they contain, yet have the same number of genes. For example, some species in the genus Allium (onion, garlic and related plants) can have over five times as much DNA as other species within the same group. The difference is largely in repetitive DNA sequences, such as transposons and transposon fragments. Such observations are challenging to square with the hypothesis that the species with the larger amounts require all of it for function in the strict sense, since the species in the group are all almost exactly the same structurally. If Onion Species B has five times as much DNA as Onion Species A, it does not mean that all of it is necessary to build the body form of Species B. No, the developmental process for building Species B involves laying down the very same structures that we find in Species A, with only slight modifications. So even if all of the “extra” DNA in Species B is doing something biochemically, it doesn’t mean that it is all necessary to build or maintain the body form. Furthermore, we might notice that the onion has over five times as much DNA as humans. Do we really think that it takes five times more functionally necessary DNA to build an onion than it does to make a human being? No. Much of the extra DNA, put simply, may be “functioning” in some way (i.e. biochemically active), but it is highly unlikely that it is functionally necessary. This observation led evolutionary and genome biologist T. Ryan Gregory to propose the “onion test” as a mental check against proposed universal functions for non-coding DNA (using “function” in the strict sense):
“The onion test is a simple reality check for anyone who thinks they have come up with a universal function for non-coding DNA. Whatever your proposed function, ask yourself this question: Can I explain why an onion needs about five times more non-coding DNA for this function than a human?”
The “vitellogenin test”
Whereas the onion test of meaningful function is a broad look across the genomes of a group of related organisms, a complementary strategy is to examine specific cases of DNA that have been widely accepted as being non-functional (i.e. not necessary for the building and maintenance of the body). Indeed, if the argument against the very idea of “:non-functional DNA” is to be convincing to most biologists, it needs to address cases where the accumulated evidence for the standard definition of non-functionality is strong. So, with a tip of the hat to Gregory’s “onion test,” I’d also like to propose a test to be used for the claim that “junk DNA” has been shown to be non-existent. Simply put, the test asks: does the claim address the features we observe in the human Vitellogenin 1 pseudogene?
Since this is a pseudogene that may already be familiar to readers from my previous discussions of “junk DNA,” it will serve as a useful example to explore further. For those who have not yet encountered this example, however, I will summarize its relevant features before going on to re-evaluate it in light of ENCODE.
In egg-laying animals, including some mammals like the platypus, the Vitellogenin 1 (Vit 1) gene produces a protein that is used in the formation of egg yolk. Yolk serves as a source of nutrients for the developing embryo once it is cut off from the maternal supply when the eggshell is formed. Placental mammals, like humans, retain a link to their mothers throughout their embryonic development through the placenta, and therefore do not need egg yolk in the same way that egg-laying organisms do.
Several years ago, a group of researchers went looking for remains of Vit 1 gene sequences in humans and other mammals. According to evolutionary theory, all mammals are the descendents of egg-laying ancestors – meaning that, if traced back far enough, placental mammals and modern egg-laying organisms such as birds once were the same ancestral species, with a common genome. Working with this knowledge, the researchers located the Vit 1 gene sequence in chickens, and took note of the sequences on either side of it (for convenience we’ll call them “Gene A” and “Gene B”. They then located these sequences in the human genome, where they also sit side-by-side. Examining the sequence between Gene A and Gene B in the human genome revealed that the mutated remains of the Vit 1 gene were still present in the human genome, in the exact spot that an expectation of common ancestry (in this case, conservation of genome structure, or shared synteny) would predict:
Also, when comparing the Vit 1 pseudogene between various placental mammals, we observe that several of the inactivating mutations (deletions) are common to all, indicating that they occurred in the last common ancestor of these species, and were subsequently inherited:
To sum up, what we observe in the mammalian Vit 1 pseudogenes is as follows:
- The function of the Vit 1 gene in egg-laying organisms is well known and well understood.
- Placental mammals, including humans, do not require a functional Vit 1 protein product, yet have a Vit 1 sequence that cannot, due to many mutations, perform its known function as a protein involved in yolk formation. In other words, in placental mammals, the Vit 1 gene has suffered a loss of function that renders it a pseudogene.
- A Vit 1 pseudogene in placental mammals can be located using predictions based on shared synteny with egg-laying organisms such as chicken.
- Placental mammals, including humans, share a number of identical mutations within their Vit 1 pseudogene, indicating that these mutations happened once in a common ancestor, and were inherited from that common ancestor.
Taken together, these lines of evidence strongly support the conclusion that the Vit 1 gene we observe in placental mammals is non-functional in the strict sense – that it does not contribute to reproduction or survival. The possibility that the Vit 1 sequence in placental mammals might retain some residual biochemical activity (it once was a functional gene, after all) would not change these lines of evidence or the conclusions drawn from them. Moreover, the (however slight) possibility that certain parts of any given pseudogene might have gained an important new function - a process called exaptation that we have discussed previously - does not affect the conclusions drawn from the whole study of Vit 1 as to its origins as a previously functional but now non-functional gene.
Taking the test
Though I have presented much of this evidence about the Vit 1 pseudogene here on BioLogos in the past, I am not yet aware of any other science/faith organization that has addressed this evidence. Web searches for terms such as “junk DNA” or “pseudogene” at various such sites produce a significant number of articles addressing the topic, and all sites examined had at least one page addressing the human GULO / GLO pseudogene as a specific example. Similar searches, including searches for the more generic term “yolk” failed to reveal any discussion of this pseudogene on any of the websites listed. I would invite these groups, all of whom have recently posted on the ENCODE project to suggest that “junk DNA” is no longer a tenable idea, to “take the test” and offer an explanation for the features we observe in the human Vitellogenin 1 pseudogene.
Vit 1 pseudogene sequences were assembled using the NCBI BLAST site (http://blast.ncbi.nlm.nih.gov/Blast) and from figure S2 of Brawand et. al., 2008 (see below).
Brawand, D., Wali, W., and Kaessmann, H. Loss of Egg Yolk Genes in Mammals and the Origin of Lactation and Placentation. PLoS Biology 6, 0507-0517.