Fuzzy, but useful
One of the challenges for my students learning biology is summed up in one of my favorite sayings (that I’m sure some students are tired of hearing from me): “All the good concepts are fuzzy.” Take a basic concept like “living” versus “non-living,” for example. Obviously this is a fundamental concept for a biologist, since “biology” means the study of living things. Even here, though, we find that a precise definition of what is “alive” is a hard thing to nail down. While things like humans, dogs and cats obviously qualify (though some days with early lectures I might have my doubts for humans), there are other entities out there that blur the boundary between life and non-life. Viruses, for example, have many of the features of living things, but lack some others. Transposons are less life-like even than viruses, and there are even transposon-like entities that parasitize viruses. Life and non-life are useful concepts, but the precise boundary between them is fuzzy.
More technology = greater fuzz
Often, an increase in technological ability exacerbates the “fuzziness” issue. One example in genetics (that we will later see to be highly relevant to understanding the results of ENCODE) is the concept of “dominant” versus “recessive” for different versions of a given gene. If you recall anything at all about genetics from high school, you might remember learning about Gregor Mendel crossing pea plants that differed in certain characteristics (purple versus white flowers, for example). Mendel deduced that the “particles” that controlled a certain trait (what we would later call “genes”) came in pairs, and that the presence of one type of particle (e.g. the one for purple flowers) could mask the presence of another (in this case the one for white flowers). He deduced that one gene version (what we now call an allele) was dominant over the other one, which in turn was recessive. For Mendel, one determined a dominant / recessive relationship by examining the appearance of a plant with both alleles: whichever allele determined the appearance was the dominant one.
Advances in technology would later do two things to Mendel’s model. First, they would provide deeper insights to what was actually going on at the biochemical level. Secondly, those deeper insights would cause the concept of “dominant” or “recessive” to become more fuzzy. I’ll illustrate what I mean with a (hypothetical, but representative) example.
When Mendel did his work he was limited to what he could observe with the naked eye. Now we have the ability to examine the effects of alleles at much deeper levels than Mendel could. Let’s say, for the sake of the discussion, that the gene Mendel was working with made an enzyme that produced purple pigment. The “purple” allele of the gene (let’s represent it with the symbol “P”) made a fully functioning enzyme: its DNA is copied into mRNA, and that mRNA is used to code for the protein enzyme that does the work of making pigment. The “white” allele (let’s call it “p”), on the other hand, turns out to have a mutation in the protein coding portion of the gene. This single mutation has two effects: it stops translation early, resulting in a protein that is too short and cannot work as an enzyme. The mutation also has an effect on the stability of the mRNA: the mRNA produced by the white allele degrades more readily, resulting in a lower steady-state amount of the mRNA in the cell.
With this background in mind, suppose a scientist performs a series of different tests on a plant that has one purple allele and one white allele (i.e. is “Pp”):
If the scientist looks at the flower color of the Pp plant, she would conclude (as did Mendel) that the p allele is recessive to the P allele, since the Pp plant is as purple as a plant with two purple alleles (PP). This arises because one P allele can produce enough enzyme for complete flower pigmentation.
If the scientist compares the amount of mRNA for this gene between PP, Pp and pp plants, she would notice three different outcomes. PP would have the most, Pp would have less, and pp would have the least. For this test the Pp plant is intermediate between the PP and pp plants. The scientist would conclude that neither the P nor p allele is completely dominant over / recessive to the other (an effect known as “incomplete dominance”).
If the scientist did a test to compare the physical size of the protein enzyme in PP, Pp and pp plants, she would again notice three outcomes. PP plants would have only full-sized enzymes, pp plants would have only small enzyme fragments, and Pp plants would have both distinct sizes, full-sized and small. In this case, the Pp plant shows both character traits (full-sized and small) at the same time. The scientist would conclude that the P and p alleles are both dominant, since both alleles display their version of the trait with neither masking the other in any way (an effect known as “co-dominance”.)
So, is the P allele dominant, incompletely dominant, or co-dominant with respect to the p allele? The answer is “yes” – all three apply, but it depends specifically on the details that the new technology is revealing. Which answer is the most meaningful one? Well, it depends on the specific question the researcher is asking. Now that we have the ability to sequence DNA, we can directly observe the nature of all alleles in any given organism, and the presence of other alleles does not interfere with this observation. In effect, modern molecular biology has made all alleles “co-dominant” since all alleles display their “version of the trait” (i.e. their sequence) when they are sequenced. If one was so inclined, one could argue that “recessiveness” is an outdated concept, and that eventually we will determine through sequencing technology that all alleles are co-dominant. While this would be technically true, it would be very misleading. The p allele remains “recessive” in biologically meaningful ways: it is a loss of an enzyme function, and its complete loss has an effect on the appearance of the organism. Plants that have one of each allele (Pp) have the same enzyme content as PP plants. Anyone who would argue that “recessiveness” was no longer a feature of alleles in light of the new sequencing technology would have to address these issues in a meaningful way, since the evidence for “recessiveness” did not simply evaporate when we learned how to sequence genomes. By any measure, Mendel’s ideas of dominance and recessiveness are still useful concepts.
The relevance to ENCODE
So, how does this all relate to the ENCODE project? It hinges on another very useful, and therefore fuzzy term: “function.” Like “life” and “dominant”, “function” is a useful idea in biology, but much hinges on precisely how it is defined, and the technology used to assess its presence or absence.
The ENCODE definition of “function” is a useful one for the purposes of the large undertaking that this project represents. Specifically, ENCODE was seeking for biochemical activity in the genome: the interaction of chromatin proteins with DNA, regions of DNA that are made into RNA, and so on. This is all well and good, for we now have new tools available that allow us to test for these effects – we have new technology that can shed new insights on what is going on in the genome.
What these results don’t do, however, is cause the prior lines of evidence relating to non-functional DNA to suddenly disappear. As we saw with the dominance issue, the results from new techniques will need to be integrated into a more complete understanding of the data. We must also have a wider understanding of the strengths and weaknesses of various techniques to answer certain specific kinds of questions.
As a way to illustrate these issues for the ENCODE project, let’s consider the hypothetical example we used to explore the dominance issue. The ENCODE definition of “function” includes any detectable biological activity such as the presence of an mRNA transcript. In our example, both the “P” allele (that produces a working protein enzyme) and the “p” allele (which does not) both produce an mRNA transcript. As such, the ENCODE project would indentify both alleles as equally functional. In fact, the ENCODE definition of “detectable biological activity” as “function” would not be able to distinguish between these two alleles in any meaningful way, despite the fact that they have real, biological, and obviously functional differences. This is not to criticize the working definition of function adopted by ENCODE, but merely to demonstrate that this definition, while useful in some contexts, has limitations.
These limitations should stand as a caution to any group that wishes to adopt the ENCODE definition as the only viable definition of biological function. To consider our example again, I suspect that many of those opposed to evolution would bristle at the suggestion that the p allele was equally functional to the P allele, given than it represents a clear loss of function in keeping with common Young Earth Creationist, Old Earth Creationist, and Intelligent Design definitions of loss-of-function alleles, and the propensity of these groups to insist that such mutations destroy functional information. Yet what we have seen from these groups, by and large, is a robust embrace of ENCODE and its view of function. I suspect that these groups, in their excitement over the media frenzy declaring the idea of “junk DNA” to be dead, have not yet had time to carefully think through the implications of that embrace.
In Part 2 of this series on Wednesday, we’ll explore other working definitions of “function,” look at other lines of evidence that are better suited to distinguishing between biologically functional and non-functional sequences, and revisit some examples from my previous series on “junk DNA” in light of ENCODE.