Earlier this year I took part in a pair of lectures at my university—one where I presented on evolution and was critiqued by a colleague who is a supporter of Intelligent Design (ID) and one where my colleague presented on ID and I offered a critique. After both presentations, the audience was invited to ask questions of either of us. One key topic of discussion at both events was that of biological information: what it is, how it works, and whether it is evidence against evolution and for design.
After one of the talks, I had an extended back-and-forth with a member of the audience who held to an ID perspective. As we conversed about biological information, there came a point in the conversation where I realized he was working with a very literal conception of “information” as it pertained to living things—the reason he thought evolution was wrong and ID was correct was because living things contained written codes that could not be explained by natural processes. At this point in the conversation I asked him if he understood that what we were talking about was in fact organic chemistry—complex and intricate organic chemistry, to be sure, but organic chemistry nonetheless. He was taken aback—he had not thought about biological information in that way before.
Part of the problem, of course, is the way biologists themselves like to speak about “biological information”—we speak about the “genetic code” and use words like “transcription” and “translation” as technical terms to describe how information is processed in cells. When we write out DNA sequences, we do not use the chemical structures of the molecules, but rather abbreviate them with the letters “A” “C” “G” and “T”.
In other words, biologists commonly describe biological information with an extended analogy to one of the ways humans use information: language. This approach has its advantages, of course—but that conversation also revealed its drawbacks to me. When biologists use the term “information,” are they describing a process that is analogous to human language or code or something that is a language or code? From the perspective of my conversation partner, it was clearly the latter, based on his understanding of ID arguments.
ID and the Argument from Information
Within the ID community, the “argument from information” is used in two main ways. The first claim is that the ultimate origin of biological information—i.e., the biological information necessary to produce the first life—must be non-material. If indeed what we see in biology is information, so the argument goes, then it must come from a designing mind. In a 2010 interview discussing his book Signature in the Cell, philosopher of science and ID advocate Stephen Meyer makes this point clearly:
The DNA molecule is literally encoding information into alphabetic or digital form. And that’s a hugely significant discovery, because what we know from experience is that information always comes from an intelligence, whether we’re talking about hieroglyphic inscription or a paragraph in a book or a headline in a newspaper. If we trace information back to its source, we always come to a mind, not a material process. So the discovery that DNA codes information in a digital form points decisively back to a prior intelligence. That’s the main argument of the book…
… But there are some lotteries where the odds of winning are so small that no one will win. And that’s the situation of trying to build new proteins or genes from random arrangements of the subunits of those molecules. The amount of information required is so vast that the odds of it ever happening by chance are miniscule. I make the calculations in the book. There’s a point at which chance hypotheses are no longer credible, and we’ve long since gone past that point when we’re talking about the origin of the information necessary for life.
For Meyer, then, the existence of biological information in living things is prima facie evidence that it was designed, and not the result of a material process.
A second claim used within the ID community is that biological information, as we observe it in present-day organisms, is too complex to be the result of evolutionary processes working to assemble it over time. Put another way, even if the original information had been designed at the origin of life, evolution would not have been able to start from this information and go on to produce new genes and new functions through random mutation and natural selection. Meyer puts the argument this way:
In any case, the need for random mutations to generate novel base or amino-acid sequences before natural selection can play a role means that precise quantitative measures of the rarity of genes and proteins within the sequence space of possibilities are highly relevant to assessing the alleged power of mutation-selection mechanism. Indeed, such empirically derived measures of rarity are highly relevant to assessing the alleged plausibility of the mutation-selection mechanism as a means of producing the genetic information necessary to generating a novel protein fold. Moreover, given the empirically based estimates of the rarity (conservatively estimated by Axe at 1 in 1077 and within a similar range by others) the analysis … pose(s) a formidable challenge to those who claim the mutation-natural selection mechanism provides an adequate means for the generation of novel genetic information — at least, again, in amounts sufficient to generate novel protein folds…
It follows that the neo-Darwinian mechanism — with its reliance on a random mutational search to generate novel gene sequences — is not an adequate mechanism to produce the information necessary for even a single new protein fold, let alone a novel animal form, in available evolutionary deep time.
For Meyer, then, the presence of information in living systems, as well as his claim that natural mechanisms such as evolution cannot account for it, form a major part of his case for Intelligent Design. Information must be provided for life to begin and for new genes and proteins to arise.
So is “biological information” merely an analogy of convenience for biologists, or is what we see in the cell information in the sense of a language or code? Next, we’ll begin to explore how cellular information processing works as a way to begin addressing this question.
In 1696, British apologist and author John Edwards published a lengthy treatise on Scripture and natural theology – with the descriptive (if rather wordy) title A Demonstration of the Existence and Providence of God From the Contemplation of the Visible Structure of the Greater and Lesser World. As the title suggests, Edwards was on a mission both to convince skeptics and to shore up the faith of believers using what he viewed as the best science of the day. A significant portion of the book attacks heliocentrism – the hypothesis of Copernicus that the sun, rather than the earth, is the center of the universe and that the earth is in motion around the sun – on both scriptural and scientific grounds. These apologetics arguments were doomed to fail, as we know in hindsight. By 1730, convincing empirical evidence for stellar aberration – the effect of a moving earth on incoming starlight – was available and widely viewed as strong evidence for the Copernican view. Edwards’s apologetic, which had seemed so convincing to him in 1696, had had a shelf life of less than 35 years.
A second argument by Edwards, however, fared much better. Edwards was fascinated by the properties of the sun and stars. In particular, he was taken with their seemingly inexhaustible supply of fuel, for which there was no good scientific explanation in his day (pg. 61):
This stupendous Magnitude argues the Greatness, yea the Immensity and Incomphrensiblenes of their Maker. And if it be ask’d, Whence is that Fewel for those vast Fires, which continually burn? Whence is it that they are not spent and exhausted? How are those flames fed? None can resolve these Questions but the Almighty Creator, who bestowed upon them their Being; who made them thus Great and Wonderful, that in them we might read his Existence, his Power, his Providence…
For Edwards, then, the properties of sun and stars were both beyond the reach of human understanding, and evidence for God’s existence – and the lack of scientific explanation was a key feature in his argument. It would not be until the 1920s and 1930s that the idea of nuclear fusion as the energy source for stars would be hypothesized and tested. This argument of Edwards lasted for over 200 years before it was revealed as flawed.
An interesting question to consider is this: should Edwards have made these arguments part of his apologetic? They were, after all, effective for their time and place, and likely supported the faith of many people before they were revealed by science to be inadequate. The latter argument, in particular, remained viable for hundreds of years. Should Edwards have foregone the opportunity to make a case for God with this approach in light of the possibility that future science might render his arguments null and void? If you had been alive in 1696, would you have wanted to know how these arguments would fare over the coming decades and centuries?
Biological Information: Great and Wonderful
If Edwards had been aware in 1696 of the intricate processes that govern information processing in the cell, he likely would have described them as “great and wonderful” as he did the properties of the stars. And indeed, both processes are great and wonderful, and (in my opinion) do offer a signpost toward the existence of their Creator. Where I differ from Edwards, however, is that I do not consider a scientific understanding of either process as a diminishment of such a view. With my limited understanding of nuclear physics and how it plays out in stars, for example, I am amazed that fusion reactions can produce the heavier elements necessary for life. In my mind, understanding the physical process is all the more reason for worship and wonder.
So too with biological information. As a cell biologist and geneticist, I find the details of how cells perform information processing fascinating. Nor do I find potential scientific explanations for the function or origins of these processes threatening to my faith. While there is much that remains for science to discover about this area, I think it is misguided to use that fact as the basis for arguments defending the Christian faith, as some in the Intelligent Design (ID) movement have done.
In order to evaluate those ID arguments, it will be helpful to have a picture of these processes in mind. Let’s sketch out some of the basic details before we discuss what science knows, and doesn’t know, about the function and possible origins of this elegant set of biochemical reactions.
DNA and Proteins: Archive and Actions
One of the first things that students of biology learn is that DNA functions as a hereditary molecule, but protein molecules perform most of the day-to-day jobs that need doing in the cell. These two types of molecules are especially suited for their particular roles, and neither is capable of performing the other’s role. Examining their particular properties reveals why this is the case.
Both DNA and proteins are polymers, which is just the technical way of saying that they are large molecules made up of repeating units (called monomers) joined together. If you’re familiar with LEGO, the toy building bricks, you can imagine a stack of bricks – say, a stack of 4×4 bricks of different colors. If one 4×4 brick is a “monomer”, then a stack of such bricks is a “polymer”. The different colors can represent the different monomers available – which, in our analogy, refers to the four possible monomers for DNA (the famous A, C, G, and T) or the 20 different monomers found in proteins (known as amino acids).
For DNA, the four monomers each have an interesting property: their chemical structure is physically attractive to one of the other monomers. “A”, for example, is the chemical adenine – which is attractive to “T”, or thymine. Here’s what the chemical structures look like:
In this diagram, adenine (A) is on the left, and it is paired up with a thymine (T) on the right. Solid lines indicate chemical bonds (“covalent” bonds) within the molecules. The two dashed lines, however, show attractions that are not covalent bonds, but a weak attraction called a “hydrogen bond”. Think of hydrogen bonds as weak magnetic attractions holding the two molecules in place relative to each other. Similarly there are hydrogen bonds that form between cytosine (C) and guanine (G).
The importance of these attractions is that one polymer of DNA can act as a template to construct a second polymer simply by matching up the monomers one by one as they are added to a growing chain of DNA (a job done by protein enzymes). It is this feature of DNA that makes it very easy to copy accurately – making it an ideal carrier of hereditary information.
In contrast to the mere four monomers of DNA, proteins are made up of 20 monomers – the amino acids. The molecular shapes of amino acids are much more diverse than for DNA monomers, as can be seen in this sampling of the 20:
The functional importance of this diversity is that proteins of many shapes can be constructed from this set of monomers, whereas all DNA pretty much has the same shape (the famous double helix of two complementary polymers wrapped around each other). The diverse shapes of proteins allow them to do all sorts of biological functions – act as cell structural components, function as enzymes to speed up chemical reactions, and so on. Proteins do many things; DNA does one. Each is very well suited to its role, and neither can do the function of the other. DNA cannot take on the myriad of shapes needed for functional roles in the cell; proteins cannot use their monomers to copy themselves and pass on their information, since amino acids do not pair up with a partner in the way DNA monomers do. Both roles are essential for life as we know it: we can’t live without either.
In the next post in this series, we’ll examine the role of RNA – a molecule that acts as a bridge between the information in DNA, and the structure of proteins. As we will see, this molecule acts as a bridge between these two “languages” because it can carry information like DNA, and fold up to take on functional shapes like proteins.
Previously, we explored how DNA functions as a carrier of information. However, it lacks the ability to perform cellular activities, given its inability to form the complex shapes needed to act as enzymes, structural components of the cell, and so on. In contrast, we saw that proteins perform these roles admirably, while themselves lacking the ability to act as a template for their own replication (as DNA does). Proteins are, in a sense, “disposable” entities: once made, they function until they are damaged and recycled by the cell – broken down for the amino acid monomers they contain. They cannot replicate, nor can their structure be used as a template for replication like the structure of DNA. DNA and proteins are thus complementary in function: DNA supplies the information needed to determine the sequence of amino acids to make functional proteins, and proteins do the cellular work that DNA on its own cannot do.
Interposed between these two systems is a third set of large molecules: ribonucleic acids, or RNA. One way to think about RNA is as a single-stranded version of DNA. This is not entirely correct, since RNA uses one monomer unique to itself (uracil, or “U”) in place of the thymidine (or “T”) in DNA. Still, it’s not a bad way of visualizing it. DNA has two strands that wind around each other in the famous “double helix”. As we saw in the earlier, the attraction between the two strands arises from the alignment of atoms that participate in weak attractive bonds called hydrogen bonds.
RNA, since it is single-stranded, does not have an opposing strand with which to form hydrogen bonds. This allows an RNA molecule to form hydrogen bonds between its own monomers– bending and flexing to line up monomers for pairing. The result is a molecule that has both information-bearing properties like DNA, but the ability to take on an array of functional three-dimensional shapes, like proteins. RNA molecules can be unfolded to use their sequence of monomers to specify a copy, and can fold up to do important functions in the cell.
As it happens, three distinct classes of RNA molecules are essential for transferring the information of DNA into proteins: “messenger RNA”, “transfer RNA” and “ribosomal RNA.” The overall process is straightforward: the sequence of DNA monomers (nucleotides) needs to be transferred to its corresponding sequence of protein monomers (amino acids). Let’s examine the roles of each of these RNA classes in this process in turn.
Messenger RNA: from DNA to working copy
Each chromosome is a very long DNA double helix. This large entity is not suited to easily moving around within the cell; moreover, it also contains the DNA sequence of many proteins (the regions of a chromosome that have sequences for proteins are called protein-coding genes, a term you are likely familiar with). When A cell needs to convert the DNA of a protein-coding gene into a protein, it is copied into a single-stranded RNA version that spans only this one gene. A single-stranded RNA copy of an individual gene is called a messenger RNAs, or “mRNA”. To make the copy, enzymes spread apart the two strands of the chromosomal DNA such that a section of it is now two single-stranded regions, and one of the strands is used as a template to make the strand of RNA. This provides the cell with a “working copy” of a single gene, in RNA form, that can easily move around within the cell. Specifically, mRNAs need to leave the cell nucleus – the internal compartment where chromosomes are found – and be transported out into the cell cytoplasm – where further processing can take place. While mRNAs do have some 3-D structure that is important for their function, they are mainly used to take a copy of the DNA sequence to the place where it can be converted to an amino acid sequence – a large enzyme complex called the ribosome. Since the mRNA and the DNA from which it was copied are both nucleotide sequences, scientists called this process “transcription” when it was discovered. Transcription, as the name implies, is copying a text to make a duplicate in the same language. The process of converting the nucleotide sequence, or “language” into an amino acid sequence, is correspondingly called translation. The next two RNA types are required for this process.
ABOVE: This 3D animation shows how proteins are made in the cell from the information in the DNA code.
Cracking the code
One of the challenges of translation that long puzzled biologists was trying to understand how a sequence of DNA monomers – with four monomer options – was converted into a sequence of amino acids, with 20 options. Indeed, the few monomer types found in DNA lead early biologists to conclude that it was far too simple to act as a repository of the vast amounts of information needed by a cell. Later experimental evidence in favor of DNA as the hereditary molecule would have to swim against this current of suspicion.
Once the DNA double helix structure was worked out in the early 1950s, there followed something of a race to elucidate exactly how such a simple structure could specify the complex sequences of amino acids. The structure of DNA immediately answered the “how does it replicate with high accuracy?” question, but failed to reveal how the precise amino acid sequences of proteins were specified.
Since DNA has only four monomers, scientists quickly realized that a simple one-to-one correspondence between one nucleotide and one amino acid would not be the answer – since such a system could only allow for four amino acids in proteins, when 20 were known. Alternative hypotheses were then explored – one of which was that groups of nucleotides could be used to specify a single amino acid. Pairs of nucleotides would thus have 16 possible states (four options for the first and four options for the second, giving 16 total possibilities). Since this is also less than 20, the idea that three nucleotides might specify a single amino acid was investigated. This system allows for 4x4x4 combinations, or 64 in total. While this exceeded the number needed, this hypothesis proved fruitful. Over time, biologists worked out that three nucleotides were indeed used to “code” for a single amino acid. The fact that there were more nucleotide combinations than the 20 required was also explained in time – many amino acids could be coded for by several combinations of nucleotides. For example, the amino acid glycine was found to be coded for by the DNA nucleotide triplets GGA, GGC, GGG, and GGT. The “code” was in fact partially redundant.
A code by any other name
It was also, unsurprisingly, at this time that the “code” analogy for these correspondences entered the biologist’s vocabulary. The race to figure out the links between nucleotide triplets and their resulting amino acid was discussed by scientists as “cracking the biological code” and similar phrases. That this work took place in the 1960s, following on from the successes of Bletchley Park and Ultra in World War Two, and in the midst of the espionage and counterespionage of the Cold War, lent further weight to the metaphor. So apt was this analogy, that many scientists, to say nothing of the media and the public, often did not qualify that this was in fact an analogy. The name given for the nucleotide triplets was “codon”, and to this day biology textbooks speak of the codon table as the “genetic code.”
Code or chemistry?
Though the biologists who did this work used “code” language as an analogy for the complex chemistry they were discovering, it is important to remember that they did not view it as an actual code in the sense of a symbolic system designed by an intelligence. In contrast, the Intelligent Design (ID) movement does view the “genetic code” and its associated chemistry in this way – primarily because they claim that natural processes are not sufficient to explain its origin. Once we’ve examined how this intricate system works, we’ll be in a better position to understand and evaluate that claim.
Is the process by which cells use the information stored in DNA to form proteins complex organic chemistry, an indicator of a designing intelligence, or both? The Intelligent Design movement claims that it is very much both, stating that the genetic code found in cells is in fact a genuine code, like what a human might create. In this series we’re taking a tour through the complex chemistry that cells use to process information in order to understand and evaluate this claim.
In yesterday’s post, we looked at the role of messenger RNA (mRNA) as a means to prepare a gene’s DNA sequence for conversion into an amino acid sequence – a process known as translation. Translation, as we have seen, is the process whereby a nucleotide sequence in mRNA is converted – three nucleotides (one codon) at a time – into an amino acid sequence. This process is accomplished by two other types of RNA – let’s examine their roles.
Bridging the two languages: tRNA
Once the codon “code” was worked out, the next question was how each amino acid was specified by a particular codon. One hypothesis was that some sort of adaptor molecule existed for each codon – a molecule that would both recognize the codon sequence and be physically connected to an amino acid. It was known that the enzyme complex that connected amino acids together was the ribosome, so these adaptors, if they existed, would have to work with the ribosome and the mRNA it was using as a template. Eventually it was shown that these adapter molecules are a different kind of RNA: transfer RNAs, or “tRNAs” for short”. In a literal way, they act as a bridge between the “language” of nucleic acids and the “language” of amino acids.
Though mRNA molecules may have some three-dimensional structures important for their function, the structure of tRNA molecules is essential to their role within the cell. Though tRNAs are single-stranded molecules, their nucleotide monomers can still pair up with other monomers using hydrogen bonds – except that they bond with other monomers on their same strand. The result is a single stranded molecule that folds up through base-pairing within itself – producing something that resembles a cloverleaf with three “leaves” protruding from it:
In the image above, the small right hand image shows a line diagram of the single RNA strand that makes up a tRNA. The larger image shows the actual structure of the molecule. The blue “leaf” contains the anticodon: the nucleotide sequence that recognizes and binds to its corresponding codon on the mRNA with hydrogen bonds. The end of the short, single-stranded section (shown in yellow) is where an amino acid will be physically joined to the tRNA. The loading of the proper amino acid onto the correct tRNA is what gives the code its specificity. Amino acid loading is accomplished by protein enzymes that recognize the shape of a particular tRNA, the shape of its corresponding amino acid, and join them together with a chemical bond. Amino acids are present in the cell, floating around, and available for these enzymes to grab and bind on to their correct tRNA molecule. Interestingly, this process, along with the entire transcription / translation system, depends on random, chaotic motion within the cell, a type of motion known as Brownian motion.
Once loaded with amino acids, tRNA molecules can interact with the ribosome. This enzyme complex is where mRNAs and tRNAs meet, and the third class of RNA molecules does its work: ribosomal RNAs, or rRNAs.
The ribosome: a massive rRNA enzyme complex
Let’s look at a “cartoon” version of the ribosome to help us understand its function:
The ribosome serves as a platform for the mRNA, and provides two places where tRNA molecules can enter “slots” and interact with it. An incoming tRNA (brought in through Brownian motion) is stabilized by its anticodon attraction to the codon on the mRNA, and once stabilized, the amino acid it bears is bonded to the amino acid attached to the tRNA on the adjacent slot. This reaction breaks the bond of this amino acid with its tRNA, ejecting that “spent” tRNA from the ribosome. The ribosome then ratchets over one codon, and the process repeats until the amino acid chain – a mature protein – is completed. In this way the ribosome translates the nucleic acid sequence of the mRNA into a specified sequence of amino acids.
It has long been known that the ribosome is a complex of RNA molecules (named “ribosomal RNA” or “rRNA”) and some associated proteins. What was suspected was that the ribosome might in fact be an RNA enzyme, or “ribozyme” and not a protein. Though most enzymes (molecules that favor specific chemical reactions) are proteins, some are RNA molecules.
The idea that the ribosome is a ribozyme was first suggested by proponents of the “RNA world” hypothesis. This hypothesis suggests that present-day, DNA-based life is a modified descendent of RNA-based life. In this proposed model, RNA was both a hereditary molecule and a source of functional, 3-D structures that did enzymatic functions. Only later, so the hypothesis goes, did DNA get added on to act as a hereditary molecule, and proteins take over most enzymatic functions.
One prediction of the RNA world hypothesis was that the ribosome might have retained its RNA-based enzymatic structure. This was spectacularly confirmed in the year 2000, by careful analysis of ribosome structure. These researchers showed that although the proteins in the ribosome stabilize the rRNA molecules, they do not have an enzymatic role.
ID, the ribosome, and the RNA world hypothesis
Those who follow the Intelligent Design literature will know that philosopher and historian of science Stephen Meyer discusses transcription and translation at length in his 2009 book Signature in the Cell. In this book, Meyer attempts to build a case that the “information” we see in living organisms is in fact information in the same sense as a human code or a language. Part of his case involves casting doubt on the RNA world hypothesis, since this hypothesis suggests a material, chemical origin for the genetic code, even if an incomplete one. One of Meyer’s critiques in Signature is that such a hypothesis would have to explain how ribosomes transitioned from using RNA enzymes to using protein ones: Meyer erroneously claims that the enzyme in the ribosome that joins amino acids together is a protein. This is, of course, incorrect, since present-day ribosomes are ribozymes. As we have seen, proponents of the RNA world hypothesis suspected that the ribosome might still be a ribozyme, since its function may have been difficult to transition from an RNA enzyme to a protein one. When, in 2000, the ribosome was definitively shown to be a ribozyme, it was widely seen as a successful prediction for the RNA world position. That Meyer was unaware of this widely-cited and highly-influential work (it would garner the 2009 Nobel Prize in Chemistry, the same year that Meyer’s book appeared) came as quite a surprise to biologists reading Signature, especially given its import for Meyer’s claims.
Despite this blunder, the ID community continues to use the “argument from information” as a key plank in its platform. Next, we’ll begin to sketch out that argument, and discuss its strengths and weaknesses in light of our understanding of how transcription and translation work at the molecular level.
Over the last few posts in this series, we’ve explored how cells store “information” in DNA (as a sequence of DNA nucleotides), and transfer DNA information into sequences of amino acids (i.e. make proteins) that can do the day-to-day jobs needed for running a cell. As we have seen, that process is a highly intricate one, full of complex chemistry. At the heart of the system are tRNA molecules that act as a bridge between an amino acid and its corresponding codon on the mRNA (see image above).
What we see in present-day cells is a complex, integrated system for transferring “information” from DNA to RNA to proteins—using RNA as the key intermediary. Naturally, biologists are interested in the possible origins of this system: how did it come to be? In general, the success of evolution as a theory for how living things have diversified from common ancestors has led scientists to investigate if what we call “life” is the modified descendant of a previous “non-living” system. In other words, if living things are the modified descendants of other life, what happens if you work backwards to the origin of life? Might life come from non-life? Does the information processing system we observe in cells have a natural explanation, or was it created by God in a way we would describe as “supernatural”?
As an aside, as a Christian biologist I would be perfectly fine with the answer being either “natural” or “supernatural”. Both natural and supernatural means are part of the providence of God, and the distinction is not a biblical one in any case. Perhaps God set up the cosmos in a way to allow for abiogenesis to take place. Perhaps he created the first life directly—though, as we will see, there are lines of evidence that I think are suggestive of the former rather than the latter. Similarly, I would have been fine with God supernaturally sustaining the flames of the sun for our benefit, as English apologist John Edwards claimed long ago. I do happen to think that solar fusion is an elegant way to “solve” this problem, and as a person of faith I think it evinces a deeper, more satisfying design than some sort of miraculous interventionist approach for keeping the sun going. I recognize, however, that seeing design in the natural process of solar fusion—or abiogenesis—is not the sort of argument that some Christian apologists are looking for.
Information – a major ID apologetic
So, is the information storage and processing system we see in living things the result of natural processes, or God’s supernatural action? It is precisely on this question that the Intelligent Design (ID) movement has built a significant component of its apologetic. Stephen Meyer, in his book Signature in the Cell, makes two main claims with respect to biological information. The first is that scientists do not, as of yet, have a complete explanation for how biological information arose:
… no purely undirected physical or chemical process—whether those based upon chance, law-like necessity, or the combination of the two—has provided an adequate causal explanation for the ultimate origin of the functionally specified biological information.
Secondly, Meyer claims that specified information always is the result of an intelligence:
I further argue, based upon our uniform and repeated experience, we do know of a cause—a type of cause—that has demonstrated the power to produce functionally specified information from physical or chemical constituents. That cause is intelligence, or mind, or conscious activity… Indeed, whenever we find specified information—whether embedded in a radio signal, carved in a stone monument, etched on a magnetic disc, or produced by a genetic algorithm or ribozyme engineering experiment—and we trace it back to its source, invariably we come to a mind, not merely a material process.
For Meyer, then, biological information is a clear sign of a Designer who used, at least in part, a non-material process to produce it. It should come as no surprise that I do not find this approach convincing, even though I share Meyer’s Christian convictions that God is the creator of all that is, seen and unseen (to paraphrase the creeds).
On Meyer’s first point, we agree. Research on abiogenesis has not, by any stretch, provided “an adequate causal explanation for the ultimate origin of the functionally specified biological information”. Nor will it, in the foreseeable future, if ever. The second point, however, is debatable. The examples Meyer gives are all examples of human-generated information. Yes, humans can generate information. The question, however, is whether a natural system can generate information. We know by direct experience that evolution can produce new information (a topic we will explore in detail in a later post, though it is something I have written about before, several times. Meyer, as we will see, disputes this evidence). If a natural process like evolution can produce new information, then it makes sense to see if other natural processes, perhaps similar to evolution, could have produced the information we see in living systems from non-living precursors.
Is the genetic code really a “code”?
One of the key challenges for abiogenesis research is to explain the origin of the genetic code—how it came to be that certain codons specify certain amino acids. Recall that tRNA molecules recognize and bind codons on mRNA through their anticodons—and bring the correct amino acid for that codon to the ribosome in the process. One feature of the tRNA system is that there is no direct chemical or physical connection between an amino acid and its codon or anticodon. Amino acids are connected to tRNA molecules at the “acceptor stem” section (the yellow region in the above diagram). Moreover, the acceptor stem is the same sequence for every tRNA, regardless of what amino acid it carries. Connecting the proper amino acids to their tRNA molecules is the job of a set of protein enzymes called aminoacyl tRNA synthetases. These enzymes recognize free amino acids and their proper tRNA molecules and specifically connect them together. Because there is no direct interaction between an amino acid and its codon, in principle it seems that any codon could have been assigned to any amino acid. If so, how might this system have arisen without any chemical connections to guide its formation?
Significantly, the lack of a direct chemical or physical connection between amino acids and their codons or anticodons forms a critical part of the Intelligent Design (ID) argument that the “genetic code” is in fact a genuine code of the sort that is determined and manufactured by a designing intelligence, and is not the product of what scientists would call a natural origin. This argument rests on the claim that since there is no physical connection between amino acids and codons (or anticodons) in the present-day system, the “genetic code” is an arbitrary, symbolic code – that the list of codons and their corresponding amino acids are not connected through chemistry. Since there is no connecting chemistry, so the argument goes, then there is no chemical path that could bring the system into being. And if there is no material, chemical process that can bring it into being, then it must have its origin through another means—such as by a designing intelligence that produced it directly, and not through a material process.
Meyer lays out his argument for an arbitrary genetic code on pages 247-248 of Signature (emphases mine).
Self-organizational theories have failed to explain the origin of the genetic code for several reasons. First, to explain the origin of the genetic code, scientists need to explain the precise set of correspondences between specific nucleotide triplets in DNA (or codons on the messenger RNA) and specific amino acids (carried by transfer RNA). Yet molecular biologists have failed to find any significant chemical interaction between the codons on mRNA (or the anticodons on tRNA) and the amino acids on the acceptor arm of tRNA to which the codons correspond. This means that forces of chemical attraction between amino acids and these groups of bases do not explain the correspondences that constitute the genetic code…
Thus, chemical affinities between nucleotide codons and amino acids do not determine the correspondences between codons and amino acids that define the genetic code. From the standpoint of the properties of the constituents that comprise the code, the code is physically and chemically arbitrary. All possible codes are equally likely; none is favored chemically.
Here we can see Meyer’s argument clearly: in order to provide a material explanation for the origin of the genetic code, scientists need to explain how the specific correspondences between codons and amino acids came about. But, as he notes, there is no physical connection between them in the present system that can explain the correspondences. The code is arbitrary—and for Meyer, this indicates design.
Crisps or chips?
Having recently returned from a family vacation in Europe, my kids and I have a new appreciation for this line of argument. Travelling to the UK put our family into a similar, yet distinct linguistic context. Learning the words for various things in the UK was part of the fun. For example, the kids soon learned that if they wanted a bag of potato chips, they needed to ask for “crisps” instead of “chips”—“chips” being what they thought of as “French fries” (though curiously retained in the common UK/North American construction, “fish and chips”). Now, does it matter if a group settles on “chips” or “crisps”? No, not really—what matters is that people know what you are talking about. In principle, any word could be used for thinly sliced and deep-fried potatoes, as long as everyone in the group agreed on what that word meant. Languages thus have an element of arbitrariness to them, and manufactured codes even more so. In fact, a human code benefits from arbitrary associations in that it makes it much harder to crack.
I recall reading Meyer’s argument for an arbitrary code when Signature first came out in 2009, and being surprised by it. The reason for my surprise was simple: in 2009 there was already a detailed body of scientific work that demonstrated exactly what Meyer claimed had never been shown. Though Meyer claimed that “molecular biologists have failed to find any significant chemical interaction between the codons on mRNA (or the anticodons on tRNA) and the amino acids on the acceptor arm of tRNA to which the codons correspond” this was simply not the case.
One hypothesis about the origin of the genetic code is that the tRNA system is a later addition to a system that originally used direct chemical interactions between amino acids and codons. In this hypothesis, amino acids would directly bind to their codons on mRNA, and then be joined together by a ribozyme (the ancestor of the present-day ribosome). This hypothesis is called “direct templating”, and it predicts that at least some amino acids will directly bind to their codons (or perhaps anticodons, since the codon/anticodon pairing could possibly flip between the mRNA and the tRNA).
So, is there evidence that amino acids can bind directly to their codons or anticodons on mRNA? Meyer’s claim notwithstanding, yes—very much so! Several amino acids do in fact directly bind to their codon (or in some cases, their anticodon), and the evidence for this has been known since the late 1980s in some cases. Our current understanding is that this applies only to a subset of the 20 amino acids found in present-day proteins. In this model, then, the original code used a subset of amino acids in the current code, and assembled proteins directly on mRNA molecules without tRNAs present. Later, tRNAs would be added to the system, allowing for other amino acids—amino acids that cannot directly bind mRNA—to be added to the code.
The fact that several amino acids do in fact bind their codons or anticodons is strong evidence that at least part of the code was formed through chemical interactions— and, contra Meyer, is not an arbitrary code. The code we have—or at least for those amino acids for which direct binding was possible—was indeed a chemically favored code. And if it was chemically favored, then it is quite likely that it had a chemical origin, even if we do not yet understand all the details of how it came to be. As such, building an apologetic on the presumed future failings of abiogenesis research, when current research already undercuts one’s thesis, seems to me as problematic for Meyer in 2009 as it did for Edwards in 1696. Do unanswered questions remain? Of course. Should we bank on them never being answered? Or would it be more wise to frame our apologetics on what we know, rather than what we don’t know?
In the next post in this series, we’ll address another of Meyer’s claims: that evolution is incapable of generating significant amounts of new information.
So far, we have examined evidence that strongly suggests that the genetic code had a natural origin. If this is correct, it undermines the Intelligent Design (ID) argument that the genetic code was designed apart from natural processes. To put it plainly, if the genetic code had an origin driven in part by chemical binding events, then it is not a “genuine code” in the sense we humans use the word—and further research might reveal plausible scenarios by which the entire code may have come to be.
When writing that last section, however, I had forgotten that ID proponent Stephen Meyer, along with his colleague Paul Nelson, had written a paper disputing the evidence for direct templating. They published their work in the in-house ID journal BIO-Complexity in 2011. Though I read it shortly after it was published (and I recall finding it unsatisfactory, even then) I did not remember it when writing that last post. This was indeed a mistake—I should have at least mentioned it, or better still addressed its claims. Failing to do so led to the Discovery Institute—the leading ID organization of which Meyer is a key member—calling me out for “recycling arguments refuted years ago”. Well, if nothing else, it made for a good headline I suppose. And yes, I have been using these arguments against Meyer’s position for several years—since in my view, they remain as valid now as they did in 2010 when I first raised them. So, they are “recycled” in that sense. But have Meyer and Nelson truly “refuted” the evidence for chemical interactions between amino acids and codons or anticodons? No, they have not—but it will take some effort to understand why. Since this claim is so important for the ID movement, it’s worth the effort to understand the science, as well as the inadequacy of the ID interpretation of it.
What’s at stake?
The ID movement in general, and Meyer in particular, has staked quite a lot on the assertion that the genetic code is a “genuine code”, in the sense that it represents arbitrary assignments of amino acids to codons. Because of this, advances in research that bear on this assertion are problematic for ID. If science eventually demonstrates a plausible mechanism for the natural origins of the genetic code, ID will lose a major argument of key apologetic importance. As such, it’s not surprising to see science in this area vigorously contested by the ID movement. For them, the genetic code is a code, and codes only come from intelligent coders, as it were.
As a kid, I used to be fond of making my own secret codes by drawing up a table of correspondences between symbols of my choosing and letters of the alphabet. Assigning correspondences between pairs of symbols and letters was of course arbitrary—there was nothing about the symbols and letters that chose themselves. In order to decode messages, the table was necessary, since it could not be worked out from examining the code itself. Meyer sees the genetic code in exactly this way—as a list of amino acids corresponding to certain codons, where in principle any amino acid could be assigned to any codon in an arbitrary fashion. For Meyer, calling the “genetic code” a code is not merely using an analogy to human-designed codes: he sees it as a genuine, symbolic code directly constructed by an agent rather than by a natural process. This thesis forms a major tenet of his 2009 book, Signature in the Cell.
As we have seen in Signature, Meyer claimed in support of his view that chemical interactions between amino acids and codons had never been found. If such interactions were known, so his argument goes, then there might be a case for some sort of natural, chemical process that led to the present-day genetic code. Since no interactions were known, Meyer claimed, there was no support for a natural process that could have formed the code. As such, he argued, the lack of a natural explanation for the genetic code shows that it was directly fashioned by an intelligent agent.
Meyer and Nelson respond to Yarus
In their paper disputing the evidence for amino acid binding, Meyer and Nelson make a series of claims about the work of Yarus and colleagues (who, as we have seen, are a major research group working on the direct templating hypothesis). Meyer and Nelson list their arguments as follows:
- Yarus et al.’s methods of selecting amino-acid-binding RNA sequences ignored aptamers that did not contain the sought-after codons or anticodons, biasing their statistical model in favor of the desired results.
- The DRT model Yarus et al. seek to prove is fundamentally flawed, since it would demonstrate a chemical attraction between amino acids and codons that does not form the basis of the modern code.
- The reported results exhibited a 79% failure rate, casting doubt on the legitimacy of the “correct” results.
- Having persuaded themselves that they explained far more than they actually had, Yarus et al. then simply assumed a naturalistic chemical origin for various complex biochemicals, even though there is no evidence at present for such abiotic pathways.
To summarize, Meyer and Nelson assert there is no need to abandon the claim made in Signature that the genetic code has no natural explanation (and thus, is the direct creation of a designer) for several reasons: the work of Yarus and colleagues is poorly done, and in reality does not show evidence of binding; amino acid binding is not a feature of the modern code, so these proposed mechanisms would not explain the origin of the code in any case; the results have a high rate of failure, so even the claimed positive results are suspect; and Yarus and colleagues do not account for other unexplained problems for hypotheses of abiogenesis in general.
Let’s examine these arguments in detail, starting with the first and third claims.
The first claim seeks to explain away the observed binding of amino acids to their codons or anticodons on short lengths of RNA (aptamers) as merely statistical artifacts of how Yarus and colleagues conducted their experiments. Meyer and Nelson liken the experimental design to fishing with a net, throwing back fish that are not wanted, and then declaring that almost only “wanted” fish were caught in the first place.
This shows a misunderstanding of how Yarus and colleagues actually did their experiments. Sy Garte (who has written for BioLogos and is a biochemist) was quick to point this out to Nelson in the comment thread for the previous post in this series. Yarus and colleagues did indeed examine RNA sequences that did not bind to specific amino acids. As Garte wrote in response to Nelson:
Actually, Yarus in many of his papers, did exactly what you are asking of him here. As one example from his 2003 paper that I linked in my previous comment, he writes: “Of the remaining isolates sequenced, only one other repetitively isolated motif was prevalent, representing 18% of clones. Although it contained a possibly interesting conserved AUAUAUA sequence, this second isolate showed little specificity, having apparently similar affinity for isoleucine, alanine, valine, and methylamine.” Note this second isolate (with no useful specificity) is also based on the ILE codon. In other words, he did look at other enriched sequences, and further evaluated them. He frequently admits that his technique isolates RNAs that are unrelated to amino acid-specific codons or anticodons.
As Garte rightly points out, the experiments that Yarus and colleagues have performed over the years have reported a number of RNA sequences that bind amino acids nonspecifically, or that bind amino acids without containing the codon or anticodon for that amino acid. So, contra to Meyer and Nelson’s claim, Yarus and colleagues did indeed examine a wide variety of RNA sequences that interact with amino acids. Meyer and Nelson are simply incorrect on this point—and as we will see in an upcoming post, these binding affinities have been confirmed by researchers in other groups, increasing our confidence that they are genuine, and not statistical artifacts arising out of biased, sloppy research.
Secondly, the binding results that Yarus and colleagues report are not based on trying to find “desired results” as Meyer and Nelson claim. The Yarus research group is interested in genuinely discovering what amino acids bind to their codons (or anticodons), and which do not. In fact, they fully expect that there are some amino acids which will not exhibit this sort of binding. They also expect that amino acids that do bind one of their codons (or anticodons) will not bind all of their possible codons or anticodons. Recall that the genetic code is partially redundant—most amino acids can be coded for by several codons. Rather, they expect that even the amino acids that were added to the code by chemical binding would later also “pick up” additional codons that they do not bind to. In other words, the direct templating hypothesis is not expected to explain the origin of the entire code, but only a more ancient subset within the current code. That the current code has a subset within it that exhibits specific binding affinities between amino acids and codons (or anticodons) is strong evidence that the current code passed through a stage where these affinities were important. As such, Meyer’s claim that the code is entirely arbitrary, and thus “designed”, fails. The genetic code is not like my childhood codes – some of the “symbols and letters” in the genetic code did seem to “pick themselves” – they paired together because they are chemically attracted to each other.
The idea that direct templating does not purport to explain the origins of the entire code is a point that Meyer and Nelson do not seem to understand. This is most obvious in their third objection in the list above, that “the reported results exhibited a 79% failure rate, casting doubt on the legitimacy of the ‘correct’ results.” In order to understand why Meyer and Nelson are mistaken, we need to understand where the “79%” figure comes from.
The work of the Yarus lab, summarized and discussed in their 2009 paper, examined eight amino acids (of the 20 found in the present-day code) for evidence of binding to their codons or anticodons. Their data reveal that six amino acids show strong evidence of binding to one or more of their codons or anticodons. These six amino acids, however, only show evidence for binding for a subset of their codons or anticodons (Table 1):
Due to the redundancy of the genetic code, these eight amino acids have between them 24 possible codons, and thus 24 possible corresponding anticodons. For example, arginine can be coded for by six different codons (with six corresponding anticodons) in the modern code, but only three of those 12 possible sequences show evidence of binding to arginine directly. Thus, for these eight amino acids there are 48 possible sequences that may have shown binding to their amino acid. As you can see from the table, only 10 of the 48 show significant binding—or roughly 21%. That means that 79% of the possible codon or anticodon sequences do not show binding for this sample of eight amino acids. This is what Meyer and Nelson report as the “failure rate” for these experiments. But this is only a “failure rate” if one somehow expects that all 48 sequences should be shown to bind—and it would seem that Meyer and Nelson have exactly this expectation. But this is emphatically not what the direct templating model expects or predicts. Rather, as we have seen, the expectation is that only a subset of the code was established through chemical binding, and that later on other amino acids, codons and anticodons were added to the code. Whether they understand it or not, Meyer and Nelson are refuting a straw-man version of the direct templating model.
Moreover, the claim that the high “failure rate” casts doubt on the veracity of the bindings that were observed is puzzling. If Yarus and colleagues had reported binding affinities for all 24 codons and all 24 anticodons, that would be good reason to suspect that their experimental design was not able to distinguish between real binding and spurious binding. The fact that they observe differences—highly significant binding for some sequences, but not for others—indicates that their experimental design can in principle distinguish between binding and non-binding. So, far from being a problem for Yarus and colleagues—as Meyer and Nelson present it—this is evidence that their experimental design is appropriate and working correctly.
In the next section in this series, we’ll look at more problems with Meyer and Nelson’s response: how the work of Yarus and colleagues is profitably informing the research of other groups, and leading to new discoveries that bear on the direct templating hypothesis.
As we have seen above, leaders in the Intelligent Design (ID) movement have developed an argument for design using the genetic code (the correspondences of amino acids and the nucleotide base triplets that specify them). Specifically, they claim that the genetic code is a “genuine code”—i.e. one constructed directly by an intelligent agent—and not a set of correspondences that arose through a natural process. As we have seen, however, this argument has to face strong evidence that part of the genetic code does in fact have its origin through physical interactions between amino acids and their corresponding codons or anticodons. The last post detailed how several amino acids do in fact directly bind to their own codons or anticodons—suggesting that the modern translation system, with its tRNA molecules that bridge amino acids and codons in the present day, is in fact the modified descendent of a translation system that relied on direct interactions. If so, the ID argument falls apart, and one of their major apologetic arguments is lost.
Previously, we saw that the main ID proponents who use the “genetic code is a real code” argument are Stephen Meyer and Paul Nelson. In their 2011 paper attempting to rebut the evidence for direct chemical binding between codons/anticodons and amino acids, one of their main lines of argument was that the observed binding was not genuine, but rather an artifact of poorly-designed experiments. While we have examined why this is not in fact the case for the experiments in question—they were done appropriately, and the results are not spurious—there is a second way to evaluate a body of scientific research done by one specific research group (in this case, the Yarus lab): look to see if it is profitably informing the research of other groups. If other groups are building on the work of another lab, and finding it to make accurate predictions, then we can be even more confident that the results are meaningful.
As the evidence mounted—through the work of Yarus and colleagues—that some amino acids do in fact bind their codons or anticodons, other researchers began to take note. One research group decided to use the results of the Yarus lab to make a prediction that could be tested by examining present-day proteins. They reasoned that if such interactions were important at the time when the translation process was emerging, that these same sorts of interactions may have been important for how complexes of proteins and RNAs worked together at that time. In other words, they reasoned that interactions between amino acids and codons/anticodons might have had other roles in addition to translation—perhaps structural roles. Proteins and RNAs that bound together to perform a function, for example, might have used these same chemical affinities to guide their formation. If so, then examining protein/RNA complexes that are old enough to date from this time in biological history might show evidence of close association between amino acids in the protein component and matching codons/anticodons in the RNA component. But where might an ancient complex of RNA and protein be found that could be used to test this prediction?
Ribosomes: a molecular time capsule
The obvious place to look was the ribosome—the very same RNA/protein complex that cells use for translation. Firstly, the ribosome can be found in all life in the present day, meaning that it is older than the proposed last universal common ancestor of all living things—or “LUCA” for short. As such, the ribosome would have been present at the time the current translation system was worked out. Secondly, the three-dimensional structure of the ribosome is known with great precision through a technique called X-ray crystallography. We know exactly how ribosomes, with their blend of RNA and protein components, are folded together. With these two features, looking at ribosomes was the perfect way to test the hypothesis that early RNA/protein complexes used chemical affinities between amino acids and their codons/anticodons for structural purposes as well as for translation.
The results, published in 2010, were striking. Within the folded structure of ribosomes, several amino acids were found in close association with some of their possible anticodons. Note that within a ribosome, the RNA components are not translated—they are untranslated RNA molecules that act as as a ribozyme, or RNA enzyme. The protein components come from different DNA sequences that are transcribed into RNA and then translated into protein before they join the ribosome complex. As such, the RNA components and the protein components of a ribosome are separate pieces—yet these proteins have some amino acids that are attracted to their anticodon sequences within the RNA components. So, even though these attractions are not useful for translation purposes, they are present within the ribosome structure. These results strongly support the hypothesis that interactions between amino acids and anticodons were biologically important at the time when translation emerged—since they are in large measure determining the three-dimensional structure of what is arguably the most important biochemical complex in life as we know it. Moreover, these results give strong experimental support for the idea that the genetic code was shaped by chemical interactions at its origin and is not a chemically arbitrary code. In response to these results, as well as the prior work by Yarus, a third research group has extended this type of analysis to the protein sets of entire organisms—and found that this pattern of correspondences between amino acids and their codons is widespread across whole genomes. This pattern—first identified by the Yarus group—has now been confirmed by the work of many other scientists, and it continues to make successful predictions.
A second observation from the ribosome study was also informative, but for a different reason. Some amino acids in the ribosome complex are closely associated with anticodons that do not, in the present-day genetic code, code for that amino acid. These anticodons, however, have previously been suspected to once have coded for those amino acids. The genetic code shows evidence of having been optimized through natural selection to minimize the effects of mutation. Such optimization requires some codon/anticodons to be reassigned to different amino acids over time. What was fascinating for the researchers looking at the ribosome was that some codons that were previously suspected to have been reassigned are associated with what were previously thought to be the “original” amino acid in the ribosome complex. This observation provides experimental support for codon reassignment over time: even though an amino acid and a particular codon/anticodon may have a chemical affinity for each other, this affinity could later be overridden by the introduction of tRNA molecules that bridge the amino acid and the codon without direct interaction between them. Despite this reassignment, the original correspondences remain in the ancient structure of the ribosome, where they serve a structural role. As such, this evidence is a window into how the genetic code may have evolved over time: starting with direct affinities, and then shifting to a modified system with tRNA molecules that allowed some of those original pairings to be shifted through natural selection.
In summary, the supposedly “flawed” work of Yarus (as claimed by Meyer and Nelson) is not only being used successfully by other researchers, those other researchers are adding to the evidence that the genetic code (a) has a chemical basis, and (b) has evolved over time. Both of these lines of evidence undermine the ID claim that the genetic code is an arbitrary code directly produced by a designer apart from a natural process.
We have explored what is known about the ultimate origins of biological information. Although we know little about this subject, the facts we do know point to the genetic code—the mechanical basis for encoding and transmitting biological information—as having an origin shaped by chemical interactions. Accordingly, we have seen how these findings undermine the Intelligent Design (ID) argument that the genetic code was designed apart from a natural process.
A second claim commonly found in the ID literature is that—aside from the ultimate origins of biological information—evolution is unable to generate new information, or at least enough new information to produce the variety of life we observe in the present day. Claims of this nature are also commonly encountered in young-earth creationist (YEC) and old-earth creationist (OEC) circles. Given the prevalence of this claim in anti-evolutionary arguments, ID or otherwise, it’s worth delving into the evidence that evolutionary mechanisms do indeed have the ability to generate adequate amounts of new information to drive significant change over time.
Rarely is information truly “new”, and a little change goes a long way
Before discussing the known mechanisms that can produce new information, it may be helpful to provide some context. Two main points are important here: first, “new” biological information is seldom truly new. Secondly, biologists have good evidence that small amounts of new biological information are adequate to accomplish significant evolutionary change. Let’s examine these issues in turn.
Firstly, evolution is a process of descent with modification. This means that evolutionary change is not about producing “new” or “novel” forms, but rather slightly modifying existing forms. Allowing this process to unfold over millions of years can lead to significant change, to be sure—but from generation to generation within that process, the change is small. One excellent analogy for the gradual changes produced by evolution stacking up over time to accumulate major differences is that of language evolution—an analogy I have explored at length before. While Anglo-Saxon of the 10th century and present-day modern English cannot reasonably be called the “same language”, the process that produced the latter from the former was gradual enough that each generation spoke the “same language” as their parents and their children. So too with evolution—“new information” accumulates over time, usually by modifying what was there before. Even so-called “major innovations” in evolution are accomplished gradually—vertebrates are modified descendants of invertebrates; land animals are modified descendants of fish; whales are modified descendants of land animals, and so on. As such, it’s not reasonable to expect that rapid acquisition of large amounts of new information is needed to drive significant evolutionary change over time. Gradual accumulation will do.
Secondly, even a small accumulation of new information is enough to cause major evolutionary changes. One thing that we now know in this era of DNA sequencing is that species that are quite different from each other can nonetheless have very, very similar information content. A prime example is comparing human DNA with that of our closest living relatives, the other great apes. Human genes and chimpanzee genes, for example, are exceedingly close to one another in their information content and structure. We have only subtle differences at the gene level, by whatever measure one chooses—our genes (which are a small subset of our entire genomes, but the vast majority of the information that makes us up) are around 99% identical to each other. Yet these subtle information differences add up to quite significant biological differences. Many of the information differences between us are due to where and when our genes are active during development—subtle tweaks that ultimately lead to the marked differences we see between our species.
One of the ironies of Christian anti-evolutionary apologetics is that it is common to see groups argue both that humans and apes are hugely different from one another, and that evolution cannot produce significant amounts of new information. Well, one cannot have it both ways, now that we know that humans and other apes are so similar at the genetic level. If humans and apes are really that biologically different—and we are, to be sure!—then one is faced with the brute fact that these major biological differences are underwritten by a level of information change that is easily accessible to evolutionary mechanisms.
With these points in mind, let’s turn to discussing how DNA can change over time to produce new information.
Something old, something new
Darwin’s great insight about evolution was not that species shared common ancestors, but rather that species could be shaped, over time, by their environment to become better suited to it. Nature, he reasoned, could act in the same selective way that humans did to shape a species to a particular form. If populations have variation, and those variants do not reproduce at the same frequency in a given environment, then over time those variants best suited for that environment will increase in frequency, while those less suited will decrease in frequency. In this way, the average characteristics of a population could shift over time.
While Darwin understood these principles, he did not have any idea how variation was generated, or even how hereditary information was transmitted from one generation to the next. The discovery of DNA as the hereditary molecule provided the answers: DNA, in that it is faithfully copied from generation to generation, transmits hereditary information. In rare cases where DNA copying is not perfect, then new variation enters the population. DNA replication then, is both the means of maintaining information and introducing change.
While DNA is great at storing and transmitting information, it is lousy at performing the day-to-day biological functions that cells need to do—enzymatic functions, energy processing functions, and so on. These functions are performed by proteins, which are lousy molecules for storing information, but fantastic molecules for getting biological work done. RNA, as we have seen, is the bridge between these two worlds—a gene, made of DNA, is transcribed into a working copy of RNA called a messenger RNA (mRNA), which is then translated by the ribosome into a protein. The information in a gene—a DNA region—can thus be used to specify how a protein is shaped, and where and when it is made during an organism’s development. The gene regions that specify the protein structure are called coding sequence, and the DNA that specifies where, when, and how often a gene should be transcribed (and translated) are called regulatory DNA. For a given gene, some regulatory DNA is outside of the transcribed region, and some is within it. Biologists often talk about genes being “expressed” as a shorthand for a gene being transcribed into mRNA and translated into a protein. Regulatory DNA, then controls the “pattern of expression” for a gene.
As we can now appreciate, there are several ways for the information content of a gene to change. There could be a DNA sequence change (a mutation) in any part of a gene’s DNA sequence. If a change occurs in the coding sequence, there may be a change in the protein sequence—one altered amino acid, for example—perhaps giving rise to a change in function. If a change occurs in regulatory DNA, then the new variant might have one or more of several possible changes: either an increase or decrease in the amount of RNA that is transcribed, the gene being newly transcribed at times or places it was not transcribed before, or the loss of transcription at times or places where it previously was expressed.
More dramatic changes in information state are also possible. Deletion mutations can remove a gene in its entirety; or a duplication mutation could produce two copies of a gene, side by side. An interesting effect of duplication mutations is that the two copies sometimes go on to accumulate differences (in their amino acid sequences or in their regulatory DNA, or both) that lead to them acquiring distinct functions. In this way, new functions may develop over time. There are even cases known when entire chromosome sets of an organism were duplicated at once—a so-called “whole genome duplication” or “WGD” event. For example, there is very good evidence that early vertebrates had two WGD events in their lineage—the effects of which can be seen in all vertebrates living today, including humans. While many of the duplicated genes have been lost, others were retained and, over time, picked up sequence changes and functional changes. This greatly added to the information content of the vertebrate lineage.
One possible twist on the “WGD” theme is the case of hybridization—when two related but distinct species interbreed and form fertile progeny. In this case, two species diverge from a common ancestor, and over time, differences in their genes accumulate. This is also a form of information gain over time, except the gain is distributed between two related species/populations. If these two species later interbreed to form a hybrid, then the offspring will inherit one chromosome set from each species.
This situation may not be ideal, if a significant amount of change has accumulated between the two species. It may be the case that the chromosomes from each species may not readily pair with their “equivalent” chromosomes from the other species for the purposes of cell division. If so, then a WGD event may provide a fix to the less than ideal situation—a WGD event occurring after a new hybrid species forms duplicates every chromosome in the genome—giving each chromosome a new, perfectly matched partner to pair with. The end result is a species that has fully two genomes within it—a full genome from each pre-hybridization species. Moreover, the two genomes are already slightly different from one another, meaning that they are already down the path of picking up slightly different functions.
From this starting point, further changes over time are probable—shifting the information state of both ancestral genomes within the “doubled hybrid” species. A recent scientific paper provides an excellent illustration of exactly this process—the discovery that the frog species Xenopus laevis has two complete genomes from two (now extinct) ancestral species—species that were separate from one another for several million years before hybridizing.
Human genome sequence data has also revealed, in recent years, that the lineage leading to modern humans also hybridized with related hominin species in the past—species such as Neanderthals, the Denisovans, and likely others. While modern humans do not retain much of the DNA we picked up from these species, we do nonetheless retain some, and some of it is likely of functional significance for some human groups. We too have shifted our information state by this means.
It’s important to keep in mind that all of these changes in information state are based on well-known and understood mechanisms. We can observe these mechanisms occurring in the present day, and we know—from comparing the DNA of related species to each other—that a small amount of genetic change can bring about significant morphological change. As such, the claim that evolution is incapable of generating the new information needed to drive species change over time is not supported by the evidence, but rather is in stark contrast to it.
Still, one might say—these forms of information change over time merely modify existing information. Even if evolution can get a reasonable amount accomplished with modification, is it the case that evolution cannot create something truly new? As we will see in the next post in this series, evolutionary mechanisms—despite ID claims to the contrary—can and do produce genuinely new information as well.
Earlier, we discussed how there are seldom genuinely “new” features that arise through evolution. Instead, what we observe is descent with modification: evolution works by modifying existing features into “new” ones. To return to our language analogy, modern English in one sense is not “new”—it is a (highly) modified descendent of Anglo-Saxon. Of course, in another sense, it is very much “new” in that it has changed so significantly over the last thousand years. So too with evolution—whether one is willing to call evolutionary changes “new” or not, substantial change can accumulate over time in a lineage. That change ultimately needs to occur at the DNA level—in the “information state” of a species over time—in order for it to be heritable.
As we have already discussed, such changes often arise from subtle modification of preexisting genes—changes in regulatory DNA directing where and when genes are transcribed and translated; duplications of genes allowing for the duplicates to pick up different functions, and so on. While these mechanisms appear to provide the bulk of incremental change over time, other more dramatic events are also possible. Sometimes, genes are formed “de novo”—literally, “from new”—and as more and more genomes are sequenced, biologists are finding evidence for an ever-increasing number of such newly-formed genes. Such genes can be identified by comparing the protein products of genes in a group of related organisms. If one species has a protein that its relatives do not, it may be that the gene for this protein was formed after its lineage separated from the other species closest to it:
Genes like this are actually easy to find these days, since they can be identified by simply comparing protein databases of species known to be closely related to one another. In a sea of similarities, these proteins stick out like sore thumbs. They’re new, or at least more new than the other mechanisms we have looked at, all of which are incremental modifications of previously existing protein coding genes. Because cell biologists often call protein genes “ORFs”—for “Open Reading Frame,” referring to a sequence of DNA letters that can be read off, or translated, into a protein, these “lonely” protein genes picked up the nickname “ORFan” genes. They are also known as “taxonomically restricted genes”—which is just a fancy way of saying that they show up only in some lineages, but not in closely related lineages.
ID and de novo genes
As we have seen in previous posts, those in the Intelligent Design (ID) movement argue that natural processes cannot explain the ultimate origin of the information we see in biological systems (and as we have seen, that argument is seriously undermined by the evidence that the DNA code was formed, at least in part, by chemical affinities). In addition to this argument, the ID movement also claims that mutation and natural selection are unable to produce new information either from previously existing genes, or through de novo gene formation. For ID writers such as Stephen Meyer, the existence of ORFan genes is a sure sign that design was required for their production—where “design” means “apart from a natural process.” If evolution cannot account for them, Meyer argues, then design can be inferred.
This issue—the ability of natural mechanisms to produce “new” biological information—was part of the famous Kitzmiller v. Dover trial that tested the constitutionality of teaching ID in US public schools. At the trial, one expert witness for the plaintiffs, cell biologist Kenneth Miller, presented evidence that natural mechanisms were capable of generating new biological information. In the following exchange, Miller is being questioned by his own legal counsel under direct examination:
Q. Now, has there been scientific research done on this proposition of whether or not there are natural explanations for new biological information?
Miller: Yes, there has, in fact, a great deal.
Q. And could I direct your attention to Plaintiffs’ Exhibit 245. Do you recognize this exhibit?
Miller: Yes, I do. This is a review article that was written in a very prestigious journal, Nature Reviews Genetics, and it’s written by Manyuan Long and several other people. And the title of the article is, “The Origin of New Genes, Glimpses From the Young and the Old.” It’s an article that I read immediately, as many scientists did when it came out, because it describes a number of mechanisms by which new genetic information is developed by the processes of evolution.
One of the mechanisms that Miller notes in his testimony is de novo gene origination, since it is discussed in the review paper authored by Long and colleagues that Miller offers as a summary of the relevant evidence. Because this paper (and the genetic mechanisms for producing new biological information it discusses) featured prominently in the trial, it captured the attention of ID writers in the wake of their defeat in Kitzmiller. Stephen Meyer, for example, discusses the Long paper in his 2013 book Darwin’s Doubt as follows (page 211):
During the 2005 Kitzmiller v. Dover trial… biologist Kenneth Miller cited Long’s paper in his testimony. He said that it shows how new genetic information evolves… But do evolutionary biologists really know this?
After discussing mechanisms that reshape existing genes – and claiming them to be inadequate – Meyer turns to the issue of de novo gene origination (page 219):
Long does cite at least one type of mutation that does not presuppose existing genetic information, the de novo origination of new genes.
Curiously, Meyer claims that de novo gene origination is based on an unknown mechanism (page 221, emphasis his):
Indeed, evolutionary biologists typically use the term “de novo origination” to described unexplained increases in genetic information; it does not refer to any known mutational process.
Taking stock, then, many of the mutational processes that Long cites either: (1) beg the question as to the origin of the specified information contained in genes or parts of genes, or (2) evoke completely unexplained de novo jumps – essentially evolutionary creation ex nihilo (“from nothing”).
When I first read this section of Darwin’s Doubt, I was very surprised. The reason for being surprised was straightforward—the de novo origin of genes is not at all an unexplained mystery, nor does it rely on unknown mutational processes. Let’s examine how de novo genes actually form.
New? Yes; but modified nonetheless
The error in Meyer’s argument seems to be that he thinks that de novo, ORFan genes come out of nowhere in some unexplained fashion (Darwin’s Doubt, page 216):
Thus, even if it could be assumed that similar gene sequences always point to a common ancestor gene, these ORFan genes cannot be explained using the kind of scenarios that Long’s article cites. Since ORFans lack sequence similarity to any known gene – that is, they have no homologs in even distantly related species – it is impossible to posit a common ancestral gene from which a particular ORFan and its homolog might have evolved. Remember: ORFans, by definition, have no homologs. These genes are unique —one of a kind—a fact tacitly acknowledged by the increasing number of evolutionary biologists who attempt to ‘explain’ the origin of such genes through de novo (“out of nowhere”) origination.
It all makes for a good argument on the surface—but unfortunately for Meyer, the argument does not hold up once one learns a bit more about ORFan genes. Yes, they are new protein coding genes, but they are not formed from brand-new DNA sequences that arise through some unknown mechanism. They are—as we expect for the products of evolution—slightly modified descendants of very similar DNA sequences, derived from a common ancestor. Meyer is technically correct that ORFans do not have homologous genes in other organisms, but he is apparently unaware that they do have homologous DNA sequences in closely related organisms. Though these sequences are not protein-coding genes, they are often very close to becoming gene sequences. Moreover, these DNA sequences are also often in the same relative location in the genome as the new ORFan gene:
We saw that Stephen Meyer, an advocate for Intelligent Design (ID), mistakenly claimed in his book Darwin’s Doubt that de novo gene origination was an unknown mechanism—and thus not a valid explanation for how new biological information arises over the course of evolution. As we have seen, however, protein-coding genes can and do come into being de novo—“from new”— but they do so from previously existing DNA sequences. So, while their protein products might be new, their DNA sequences are modified descendants of DNA sequences that did not code for proteins.
In his effort to show that evolution cannot produce new biological information, Meyer also claims that even parts of functional genes cannot be constructed through mutation and selection. These protein parts are known as “protein folds”—amino acids in a sequence that fold up into a stable structure. Though proteins are made up of many such folds joined together, there is good evidence that evolution is capable of generating new folds in a piecemeal fashion—folds that can then be incorporated into other proteins to give them new functions, or even swapped between genes via chromosome structure changes that move stretches of DNA around. Meyer, understandably, disputes this, since it would be a mechanism for evolution to produce new information with new functions.
Before we delve into Meyer’s argument, let’s look at how proteins are made—and how they fold up into functional shapes—in more detail.
Protein chemistry and folding
Proteins, as we have seen, are made up of amino acids linked together—and there are 20 distinct amino acids found in biological proteins. Though these amino acids are different in their chemical properties—some are attracted to water, some avoid it; some have a charge, some are neutral, and so on—they also have a common structure that allows them to be joined together in a chain. The general structure of an amino acid is shown below. All 20 amino acids have this core structure in common—the only differences between them are in the “R” group, which can vary widely. The “R” is a chemical shorthand for what can be as simple as a single hydrogen (H) atom, or as complex as carbon ring structures. Representing these varied structures as an “R” allows us to see the common chemical core of all 20 amino acids (Figure 1):
The physical joining of amino acids to make proteins depends on the core structure —it does not involve the “R” groups. A “peptide bond” is formed between two amino acids when an “-OH” group (i.e. an oxygen and hydrogen atom) on one amino acid and a hydrogen atom (“H”) on the other react to form water, leaving a carbon atom (“C”) and a nitrogen atom (“N”) from the two amino acids bonded together (Figure 2):
The key thing to notice for the purposes of this discussion is this: amino acids have a common chemistry that is used to link them together. This common chemical structure also produces a second effect: it also allows proteins to fold up into two specific shapes that are repeated over and over again. The first is a corkscrew-like shape called an alpha helix, and the second is a flat, pleated structure called a beta sheet. These structures are stabilized by chemical interactions between atoms in the common core structures—not the R groups. This means that there are many, many different amino acid sequences that can form alpha helices and beta sheets. If specific R groups were required to form helices or sheets, then only a few amino acid sequences would work for these structures. Since these structures form using the atoms that amino acids have in common, many sequences work equally well.
Protein folds—the stable three dimensional shapes within proteins—often have alpha helices and beta sheets embedded within them, giving them structure. Let’s look at a few examples of real proteins to get a better idea of how this works. Hemoglobin is a protein that many people are somewhat familiar with, since it is the protein we use to transport oxygen in our bloodstream. Hemoglobin is made up of four separate proteins that bind together to form a complex, and each of the four protein units is filled with alpha helices (Figure 3):
A second example—and one that depends heavily on beta sheets, is green fluorescent protein (GFP)—a natural protein found in jellyfish that has become a favorite tool of cell biologists because it glows green when exposed to ultraviolet light (Figure 4):
Proteins, then, are stable structures that are largely built on alpha helices and beta sheets—which in turn depend on the chemical structure common to all amino acids.
Meyer and protein folds
For Meyer, new protein folds are beyond the reach of evolution, he argues, because they are too rare for chance events to produce:
… experiments establishing the extreme rarity of protein folds in sequence space also show why random changes to existing genes inevitably efface or destroy function before they generate fundamentally new folds or functions … If only one out of every 1077 of the alternate sequences are functional, an evolving gene will inevitably wander down an evolutionary dead-end long before it can ever become a gene capable of producing a new protein fold. The extreme rarity of protein folds also entails their functional isolation from each other in sequence space. (Darwin’s Doubt, page 207)
Of course, we have already seen that Meyer is mistaken here as well. If de novo protein-coding genes such as nylonase can come into being from scratch, as it were, then it is demonstrably the case that new protein folds can be formed by evolutionary mechanisms without difficulty. Nylonase is filled with stable protein folds, as you would expect since it is a functional enzyme. Moreover, its folds have now been extensively characterized at the molecular level, and, not surprisingly, they contain alpha helices and beta sheets. So, if Meyer had understood de novo gene formation—as we have seen, he mistakenly thought it was an unexplained process—he would have known that new protein folds could indeed be easily developed by evolutionary processes.
While nylonase is a dramatic example of an entire gene, with numerous folds, being produced in one fell swoop, there are other less dramatic examples that are nonetheless informative about how evolution can produce new folds and with them new functions. One such example recently came to light, and it is of profound interest because it sheds light on one part of how we became human.
New biological information and hominin evolution
One of the defining features of humans, when compared to our living great ape relatives, is the size of our brains. In general, hominins—i.e. humans, and species more closely related to humans than to chimpanzees—have larger brains relative to body size than do our closest living relatives (chimpanzees, gorillas, and orangutans). The hominin lineage, since it parted ways with the chimpanzee lineage 4-6 million years ago, has thus increased its cranial capacity over time, and we see this effect especially in species such as Homo erectus, Homo neanderthalensis, and our own species, Homo sapiens. Scientists are quite interested in the genetic changes that allowed our lineage to develop larger brains over time. While many of these changes will likely be tweaks to previously existing genes—i.e. genes that were present in the last common ancestral population of humans and chimpanzees—it is possible that some new genes have evolved on the hominin lineage that contribute to our larger brains. One research group recently set out to test this hypothesis, and the results are an excellent example for our purposes: a new gene (well, sort of new), with a new protein fold, and with a new function. Let’s take a look.
In order to look for recently-evolved genes that contribute to brain development, this group first looked for genes that are highly expressed in certain developing brain tissues—specifically, in cells that give rise to nerve cells (neurons) in the developing neocortex—the surface of the brain that has the bumps and grooves that you’re familiar with if you’ve ever seen a model of a human brain, or looked at an anatomy textbook. Those bumps and grooves—called gyri (singular, gyrus) and sulci (singular, sulcus)—increase the surface area of our brains inside our skulls. As humans develop in the womb, our developing brains become progressively folded: a process called gyrification (Figure 5). More surface area means more neural processing power, and it also goes hand in hand with more brain volume: bigger skulls gives more room for bigger brains—and the more gyri and sulci we can pack in there, the more surface area we have for increased mental processing.
So, cells in the developing neocortex are a good place to look for new genes that influence our brain surface area and volume. With a set of such genes identified in humans, the researchers then compared what they had found to the mouse genome. Mice and humans share a common ancestral population at about 125 million years ago, and the researchers were interested in finding genes that we have, but that mice do not. These genes, then, would have arisen (by duplication, de novo origination, et cetera) over that 125 million year span since our lineages have been separate. In this way, they hoped to find new genes that contributed to brain development in the lineage leading to humans.
One human gene that they found (named “ARHGAP11B”) seemed a likely candidate for promoting gyrification. It is not only not found in mice, it is not found in chimpanzees, either. This indicates that it came to be after the lineages leading to humans and chimpanzees separated: in other words, it is a hominin-specific gene. They then looked in the Neanderthal and Denisovan genomes for this gene, and found it was present. (Though these hominins are extinct, we have recovered and sequenced their genomes). These results give a range of possible times for the duplication event—after the human and chimpanzee lineages diverge, but before the human lineage diverges from the population leading to Neanderthals and Denisovans (Figure 6).
Upon closer examination, they found that this gene, despite its novelty, did not come out of thin air. It’s a partial duplication of a neighboring, ancestral gene that is still present in mice and humans. A segment of chromosome was duplicated, and the duplicated region spanned a part of this gene, producing a shortened copy next door. The shortened copy was at first just that—a shortened copy of the original gene. Later, however, a second mutation event occurred—one that created a new site for RNA splicing of the gene when it was transcribed. This new splicing site bypassed some of the shortened copy’s amino acid sequence, and instead spliced to a sequence that was not intended to code for amino acids at all. The result is that the shortened copy suddenly had a short section of brand-new amino acids tacked on to the end of it. These amino acids are unlike any other protein in any other organism—they are a brand new amino acid sequence. So, we have an example of a duplicated, shortened gene picking up a new region de novo—from previously existing, but previously non-coding, DNA.
As the researchers kept investigating, they came to a surprising conclusion: the new protein region gave the duplicate gene a new function. Using recombinant DNA techniques, they recreated the original shortened duplicate gene and compared it to the shortened gene after the new amino acids were added. The duplicate gene with the new amino acids increased the number of neurons in the developing neocortex. They even engineered it into the mouse genome, and saw that the mouse brain—which does not have gyri and sulci—began to fold as a result. The original duplicate gene, prior to the new amino acids being added, had no such effect. The new protein region was the source of this new function—and likely was a significant factor in the expansion of the hominin brain on the lineage leading to Neanderthals, Denisovans, and humans. A new protein region, de novo, carrying with it significant new functional information—that arose through readily accessible mutational steps. While we do not yet know the precise shape of this new, composite protein, we know that it is likely to be stably folded, given that it is functional. The other formal possibility is that this domain functions as a structure that can take on wide number of possible shapes depending on its context and what it interacts with. Such domains thus do not have a particular, stable structure, and are said to be “intrinsically disordered”*. Of course, the discovery that some protein domains (or even entire proteins) do not need a stable fold to take on functional shapes does nothing to help the ID cause.
“There is grandeur in this view of life, with its several powers, having been originally breathed by the Creator into a few forms or into one; and that, whilst this planet has gone circling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being evolved.”
This line, from the concluding paragraph of Darwin’s Origin of Species, is one of the book’s most widely cited passages. Given the theistic implications coupled with the poetic nature of this passage, the frequency of its citation is not surprising. However, if one were to pick a line that most adequately summed up Darwin’s thoughts on the nature of the evolutionary process, it would not be this famous passage. Rather, it would be the lines that immediately precede this passage.
“It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us. These laws, taken in the largest sense, being Growth with reproduction; Inheritance which is almost implied by reproduction; Variability from the indirect and direct action of the conditions of life, and from use and disuse; a Ratio of Increase so high as to lead to a Struggle for Life, and as a consequence to Natural Selection, entailing Divergence of Character and the Extinction of less improved forms.”
What makes this passage representative is that it follows a theme that runs throughout the entire Origin, one in which Darwin places particular emphasis on the law-like qualities that govern biology in general and evolution in particular. These “laws impressed upon matter”—as Darwin called them— were both at the heart of scientific discovery and at the heart of his theory.
Too often evolutionary theory is popularized as a random contingent process that cobbles together chance variations into whatever happens to work. Atheistic Darwinists love to point this out when they argue that evolution demonstrates we inhabit a world devoid of any purpose or design. But it is important to remember, as Darwin did, that this chance variation operates on a bed of order—on the “laws impressed upon matter.” In the dynamic evolutionary interplay between order and chance, it is the order that is more fundamental. In fact, it is the order that both shapes and directs the manner by which chance variations are able to build evolutionary novelties.
Nowhere is this more evident than in the process of protein evolution. At first glance, the possibility of a Darwinian process stumbling upon a functional protein seems to be exceedingly unlikely. The odds seemed to be stacked against evolution. To illustrate this, one needs to examine the chemical composition of proteins. Biological proteins are strings of amino acids connected together by chemical bonds. Just like letters can be arranged to produce sentences, amino acids can be arranged to produce proteins. Because there are twenty possible amino acids, a specific functional protein that is 100 amino acids long represents just one possibility amongst a total of 10150 sequences. (To put this number in perspective there are only 1085 fundamental particles in the universe.) This represents an enormously vast space. In fact, Intelligent Design advocates argue that this space is so vast that a random search through it could have never stumbled upon a distinct functional protein. The problem with this argument though is that it ignores the underlying chemical and physical order.
To illustrate the importance of this order, one need only look at how proteins fold. For most proteins to function effectively, they must take on a relatively specific three-dimensional structure. Given that there is a near infinite array of possible amino acid sequences, one might think that proteins could fold into a vast variety of stable structures. This, however, is not the case. In fact, when chains of amino acids start to fold, they collapse into mainly two distinct secondary structures, beta-sheets and alpha helices. Some proteins collapse into a number of alpha helices, others into a number of beta sheets, and still others into a combination of the two. They do this because these are the chemically stable structures for chains of amino acids when they are put into the cellular environment. A huge array of amino acid sequences fold into these identical structures, not because natural selection was lucky enough to find them in a random search, but rather because the ordered chemical rules that govern the interactions between atoms and molecules dictates this outcome.
While each of the twenty biological amino acids has a different side chain (R group) that sticks off of it, all amino acids are composed of the same nitrogen-carbon-carbon backbone. When they are strung together in proteins, all the amino acids have a hydrogen sticking off the nitrogen, N-H, and an oxygen sticking off the second carbon, C=O. It is the regularity and limitations of the rotations about these bonds of the backbone (the phi and psi angles) coupled with the need for the hydrogens and oxygens sticking off the backbone to interact that drives the formation of the alpha helices and beta-sheets. In other words, secondary structures arise from the underlying chemical and physical order that is common to all chains of biological amino acids; they are not dependent on specific rare sequences of amino acids. As a result, if evolution were to run again (using the same physics and chemistry), beta-sheet and alpha helices would certainly form in abundance, provided that amino acids were strung together in a chain.
Furthermore, these beta sheets and alpha helices that exist within proteins can only interact with each other in limited ways. This is due to the underlying chemistry and flexibility of proteins. There are only certain ways you can stably stack beta-sheets or loop around alpha helices. As a result, proteins fold into a rather limited array of 3-D structures despite there being a vast array of amino acid sequences. The degree of convergence is staggering. Current estimates suggest that only 1000 to 2000 different 3-D structures seem to be possible. 
As biologist Michael Denton succinctly puts it, there are “a number of organization rules, ‘laws of form,’ which govern the local interactions between the main structural submotifs [the alpha helices and beta sheets for example] have been identified, and these restrict the spatial arrangement of amino acid polymers to a tiny set of about 1000 allowable higher-order architectures.” Re-run evolution again with the same chemistry and physics, and the same folds would appear, because the forms of the folds are given by physics and matter is drawn by a process of free energy minimization into the complex form of the native conformation.
To form one of the 1000 or so folds that Denton refers to, one might suppose that it would require a highly specified rare sequence. That doesn’t appear to be the case, however. In fact, there are many cases in which proteins with no sequence similarities fold into the exact same 3D structure. Why? Because the chemical and physical order dictates the stability and commonality of alpha helices and beta sheets and so these form at a significant frequency in any random amino acid sequence.
Suppose that to get a specific 3D fold, one needs to have a beta sheet of a specific length and orientation in a specific location. Given that beta sheets are what amino acids commonly tend to fold into for chemical reasons, evolution does not have to stumble blindly on one specific sequence. It simply has to find any one of the multitude of sequences that fold into a beta sheet of a specific length. This allows widely divergent sequences to converge on the same structure and form beta-sheets of identical size and orientation. As a result, very different sequences can converge on the same fold.
What drives the evolution then of the 1000 or so existing stable protein folds is not randomness, although random changes in the sequence help evolution get to these stable forms. The stable protein forms that exist are the natural and robust result of the order in the underlying physics and chemistry.
Now, one could argue that despite the order influencing the emergence of protein forms, sequences that give rise to these 1000 or so forms are still rare amongst all possible sequences. For example, if the are 10150 100 amino acid long sequences, it is possible that only 1080 fold into one of these limited array of stable structures. That would mean that only 1 in 1070 sequences would fold, which still would represent a significant hurdle for an evolutionary process. The empirical evidence though suggests that alpha helices and beta sheets have a propensity to form at a much higher frequency in random sequences, again due to the underlying order. While the exact number is hard to pin down, various studies suggest that the frequency by which stable structures form is between 1 in 102 and 1 in 1011. This is a frequency that evolution can work with, but again it is a frequency that is dependent upon what Darwin called the “laws impressed upon matter.”
Previously, we saw that there is an underlying order to protein folding—that sequences of amino acids that make up proteins have a propensity to form stably-folded structures (alpha helices and beta sheets). As such, we saw that this underlying order greatly increases the probability that proteins can fold up into stable three-dimensional shapes that can perform biological functions. Arguments from the Intelligent Design (ID) movement that evolution is incapable of generating new protein folds and functions, then, are lacking. New, functional protein folds form much more readily that the ID movement claims they do.
As we have seen, the claim that evolution cannot produce new protein folds or functions is an important one for ID. For example, it forms a significant part of Stephen Meyer’s argument that evolution was not capable of producing the diversity of life seen in the so-called “Cambrian explosion”. In his book Darwin’s Doubt, Meyer lays out his argument as follows, once again referencing the work of Douglas Axe (p. 191, emphasis in the original):
… Axe was convinced that explaining the kind of innovation that occurred during the Cambrian explosion and many other events in the history of life required a mechanism that could produce, at least, distinctly new protein folds.
He had another reason for thinking that the ability to produce novel protein folds provided a critical test for the creative power of the mutation and selection mechanism. As an engineer, Axe understood that building a new animal required innovation in form and structure. As a protein scientist, he understood that new protein folds could be viewed as the smallest unit of structural innovation in the history of life.
It follows that new protein folds represent the smallest unit of structural innovation that natural selection can select.
Briefly restated, Meyer’s argument is as follows: new protein structures are required for evolutionary innovation, and protein folds are the smallest parts that a protein can be divided into. If this is the case, he argues, then new protein folds are the smallest units that natural selection can act on. If evolution is incapable of generating new folds (because they are too rare for evolution to find), then evolution cannot produce anything new.
There are several key claims buried within this argument:
- Evolutionary innovation requires new proteins with new functions.
- Proteins require new protein folds to acquire new functions.
- New protein folds, and thus new functions, are inaccessible to evolution.
In the previous posts in this series we have—for the sake of argument—granted claims #1 and #2 and focused on claim #3. And as we have seen, Meyer is greatly underestimating the fraction of protein sequences that can fold: evolution seems to have no difficulty in finding new proteins with new folds and new functions, such as nylonase. This means his overall argument fails, even on this point alone.
But what about Meyer’s other claims?
Interestingly, the second claim – that new protein folds are required for new functions – is also incorrect, and it further undermines Meyer’s argument. The reason for this is simple: there are many examples known of functional protein regions that are not stably folded. (As an aside, there is a good case to be made that his first claim is also misguided, but that is an argument too long to be explored here in any depth. Suffice it to say that in evolutionary biology we see the “same proteins” being used over and over again to control development in a wide number of distantly related animals. This suggests that even without radical innovation at the protein level, a large amount of biological innovation was and is possible – including that seen in the Cambrian. But I digress.)
Not folded, but functional
When I was a graduate student in the late 1990s and early 2000s, the claim that Meyer makes—that protein functions require stable protein folds—was very much taught as a self-evident truth. The leading molecular and cell biology textbook of the day was Molecular Biology of the Cell by Bruce Alberts and colleagues, and it is no stretch to say that it was viewed as the definitive text in the field. The fourth edition, published in 2002, makes the following claims about protein structure and function (note that conformation in this context is a technical term meaning “shape” or “structure”):
The vast majority of possible protein molecules could adopt many conformations of roughly equal stability, each conformation having different chemical properties. And yet virtually all proteins present in cells adopt unique and stable conformations. How is this possible? The answer lies in natural selection. A protein with an unpredictably variable structure and biochemical activity is unlikely to help the survival of the cell that contains it. Such proteins would therefore have been eliminated by natural selection through the enormously long trial-and-error process that underlies biological evolution. Because of natural selection, not only is the amino acid sequence of a present-day protein such that a single conformation is extremely stable, but this conformation has its chemical properties finely tuned to enable the protein to perform a particular catalytic or structural function in the cell. (pp. 141-142)
This is exactly the view of protein structure and function that I was taught: proteins have stable structures, and that stability is crucial for protein function. It should also be apparent that this is precisely the view that Meyer is working with in his books Darwin’s Doubt and Signature in the Cell, though of course he takes it further and claims that it renders evolution of new protein structures and functions impossible.
Despite the appeal of this view, and its pervasiveness among biologists of a certain age (myself included), it’s wrong. As with many things in science, additional evidence has revealed a prior model to be inadequate – and replaced it with a more nuanced model that encompasses both the prior evidence and the new data.
At the time that my textbook was confidently equating protein stability with function, it was already known that many proteins actually function because they do not have a stable structure. Such proteins were first identified in the early 1990s, and numerous examples were known by the time my textbook went to print. Such proteins came to be known as “intrinsically disordered proteins” or “IDPs”. Proteins were also found to have “intrinsically disordered regions” (IDRs) embedded within stable structures . As you can imagine, the researchers reporting these results faced an uphill battle convincing their reviewers that what they were observing was in fact real. The standard of evidence for challenging what everyone “knew to be true” was high—but eventually, as the evidence accumulated, biologists came to accept the new paradigm. One author summarized the state of the emerging field in 2002 as follows:
In the past few years, a large number of papers on proteins denoted as “natively denatured/unfolded” or “intrinsically unstructured/disordered” have appeared and it has become apparent that this phenomenon is quite general… Moreover, the functional importance of the unstructured state is underlined by the fact that most of these proteins have basic regulatory roles in key cellular processes.
In the years since, further evidence for functional, disordered proteins has continued to pile up. We now know that IDPs and proteins with IDRs contribute to a wide diversity of cellular functions. They do so by rapidly transitioning between different shapes, and interacting with a wide range of other proteins in the process. In some cases, IDPs/IDRs become (transiently) stable when they bind onto another stable protein. In other cases, it seems that they remain in flux and are not stabilized by binding a partner. For both types, their unstable, disordered state is necessary for their function. We also know that IDRs are extremely widespread—most proteins in eukaryotes (i.e. organisms that are not prokaryotes) have at least one IDR in their sequence. The human genome, which has on the order of about 20,000 genes, has an estimated 100,000 IDRs. Disorder is the new order: what was “heresy” in 1990 and “surprising but substantiated” in 2000 is now a well-accepted phenomenon, ready for inclusion in entry-level cell / molecular biology textbooks.
Of course what I learned in my textbooks is still valuable: many proteins do have stable structures, and those structures are important for their functions. Additionally, natural selection is important in maintaining those structures. What we now know is that this picture of protein biology was not so much wrong, but incomplete. Proteins can have stable structures, or not, and be functional. Since natural selection selects for function, it will preserve functional proteins that are stable and functional proteins that are disordered. What was misguided was tying stable protein structure to function in all cases.
IDPs, Axe, and Intelligent Design
The negative implications of finding pervasive, functional, and intrinsically unfolded proteins and protein regions for ID information-based arguments are substantial. Let’s briefly revisit our summary of Meyer’s argument:
- Evolutionary innovation requires new proteins with new functions.
- Proteins require new protein folds to acquire new functions.
- New protein folds, and thus new functions, are inaccessible to evolution.
We can now see that claim #2 is unfounded—proteins can acquire new protein functions without acquiring a new fold. IDPs and IDRs show us that unfolded—but nonetheless functional—sequences are widespread. As such, even if new protein folds were difficult for evolution to produce (and we have seen that they are not) this would not be a barrier to proteins evolving new functions.
Moreover, the observation that IDPs and IDRs are a widespread source of functional protein sequences creates serious problems for Meyer’s claim #3 as well. As we have seen, Meyer claims that protein folds and functions are too rare for evolution to find:
… experiments establishing the extreme rarity of protein folds in sequence space also show why random changes to existing genes inevitably efface or destroy function before they generate fundamentally new folds or functions … If only one out of every 1077 of the alternate sequences are functional, an evolving gene will inevitably wander down an evolutionary dead-end long before it can ever become a gene capable of producing a new protein fold. The extreme rarity of protein folds also entails their functional isolation from each other in sequence space. (Darwin’s Doubt, page 207)
This claim is based on the work of Douglas Axe, who estimated the prevalence of functional protein sequences by estimating the proportion of sequences that could adopt a particular protein fold. Axe’s work took a functional, stably-folded protein (an enzyme) and swapped out groups of amino acids 10 at a time. If those amino acids were capable of folding correctly in the context of the overall enzyme structure, they would allow the enzyme to function. With this method, Axe estimated that only one protein sequence in 1077 would stably fold, and thus only one in 1077 would be functional. Axe’s work has numerous additional caveats which I have discussed at length in my recent co-authored book Adam and the Genome and will not repeat here, but notice this: Axe’s experiment, based as it is on finding stably folded protein sequences, would completely miss IDPs and IDRs since they are not stably folded. Axe’s estimate of the proportion of stably folded proteins as a measure of the proportion of functional proteins is thus far, far too small, since intrinsically disordered proteins would not have been found by his approach. Yet, we know that IDPs and IDRs are everywhere: we have 100,000 in our genome alone.
As such, Meyer is doubly mistaken—new functions can form without new protein folds, and protein functions are far more common than the work he is depending on is capable of detecting. Protein folding can be stable, or it can be a whirlwind of instability—but natural selection can reap function from both.
We’ve explored the claim made by the Intelligent Design (ID) movement that evolutionary mechanisms are not capable of generating the information-rich sequences in genes. One example that we have explored is nylonase – an enzyme that allows the bacteria that have it to digest the human-made chemical nylon, and use it as a food source. As we have seen, nylonase is a good example of a de novo gene – a gene that arose suddenly and came under natural selection because of its new and advantageous function. Since nylonase is a folded protein with a demonstrable function, it should be beyond the ability of evolution to produce, according to ID.
The implications of nylonase for ID arguments are clear, and they have caught the notice of several ID supporters. In recent weeks a number of posts on the subject have appeared on ID websites. ID biologist Anne Gauger, for example, is writing a series of posts in an attempt to rebut the evidence that nylonase is in fact a de novo gene. Her motivation for this work is clear:
Venema is right. If the nylonase enzyme did evolve from a frameshifted protein, it would genuinely be a demonstration that new proteins are easy to evolve. It would be proof positive that intelligent design advocates are wrong, that it’s not hard to get a new protein from random sequence.
As we can see, the issue at hand is not so much the specific origins of nylonase, but rather the relative ease by which new, functional proteins can come into existence. If it is easy to evolve them, ID advocates are wrong. If new protein functions are vanishingly rare and inaccessible to evolution, ID would be strongly supported. With nylonase, we are dealing with events that happened in the past, so our inferences are limited to working with the evidence we have in the present. What would be better, of course, would be controlled experiments in the present that we can observe in real time.
New functions, in real time, from randomness
As technology improves, it becomes possible to design and execute experiments that were only a pipe dream even a few years ago. Sequencing technology has advanced to the point where it is now possible – even trivial – to sequence a population of DNA molecules as a mix without separating them from each other. In the past, DNA samples needed to be nearly pure (i.e. contain only multiple copies of one DNA sequence). Now, sequencing technology has advanced to the point where a mixture of many different DNA molecules can be sequenced without purification. This process also is also capable of determining the proportions of the different DNA sequences in the impure mixture.
This ability to sequence a population of DNA molecules over time and track the relative proportions of various individual DNA sequences was recently used to address the very questions at issue for ID advocates: just how easy is it to obtain a functional gene from random DNA sequence? And consequently how likely is it that de novo gene origination is a common occurrence?
To investigate this question, a group of researchers assembled a large set of randomized DNA sequences, introduced them into a population of bacteria, and sampled the population over time to see which bacteria increased or decreased in abundance over the course of the experiment. The randomized sequences were set up with engineered sequences on either side of the randomized portion to allow the random portion to be transcribed into RNA and translated into protein. (If you need a refresher on transcription and translation, see here).
In effect, the researchers were creating a large number of brand-new, random genes and seeing if any of them had an effect on the bacteria that received them. Because these new genes were both transcribed (into RNA) and translated (into protein), it would be possible for any of them that had a function to be acting as either a functional RNA molecules or as translated protein. In other words, any biological effect noted could be due to the RNA or protein product. (As we have seen, RNA can fold up into functional shapes, and as such, new randomized genes might function as RNAs rather than as proteins.) The motivation for creating a large number of randomized genes came from the expectation that functional sequences within the mix would be rare – and thus a large number of sequences would have to be explored if there was to be any hope of success.
One of the scientists involved with the study, Rafik Neme, recounts how he envisioned setting up the experiment:
During my early months in the Tautz lab, while still a Master’s student, I contemplated the possibility of doing an experiment that could support de novo evolution as a general process, and so I came up with a thought experiment. I would insert random sequences in living cells, together with enough regulatory machinery to make sure they would be transcribed and translated by the host. Then, I would wait until any of those would mutate enough to “acquire a function.” It occurred to me that starting with a sufficiently large pool of random sequences would reduce the waiting time, because some would exhibit some biochemical activity upon their introduction.
The results, to put it mildly, were surprising. The experiment found functional, beneficial genes in the mix – genes that increased the ability of bacteria to replicate and compete against other bacteria in the population. What was most surprising, however, was the sheer number of beneficial random genes identified in the experiments. Overall, the researchers report over 600 randomized genes that were beneficial to the bacteria that received them. Rather than new functions being rare, they were common. In one experiment, 25% of the random sequences tested were beneficial. Rafik relates the surprise that these results produced:
I can remember clearly the moment when the first results came in. Not only had we found beneficial activities over and over, but the sheer amount detected was beyond our imagination. […] We had expected that the sequence space would be largely devoid of functions, but it seems to be quite the opposite.
Further work by Rafik and colleagues would show that in some cases the new functional genes acted at the RNA level, and in some cases through the new protein that was produced. They examined four of the new beneficial genes closely, and found that three of them exerted their beneficial effects through the RNA form of the gene – the transcription product. One of the four, however, exerted its beneficial effect through the translated protein product. This demonstrates that transcription of random DNA sequences into RNA has significant potential to produce new functions. The confirmation that one of the new beneficial genes acts through the protein product also confirms that random DNA sequences can be a ready source of functional proteins. Though there are several hundred other functional de novo genes left to analyze from this study, these results are a demonstration that new functions are easy for evolution to find. Though the researchers had expected new functional genes to be rare in a pool of random sequences, they were everywhere.
The importance of these results for ID arguments is clear. By direct experimental test, new biological functions have been shown to be common, not rare, within random sequences – and that these functions may be found in either RNA transcripts or de novo protein products. By Gauger’s own measure, ID advocates have been shown to be wrong. Since this particular ID claim undergirds a large proportion of the ID argument that biological information cannot have arisen through evolution, the consequences for ID are significant.
So, we can see that the nylonase issue is something of a distraction – a missing of the forest to focus on one particular tree. Even if this particular example could have an alternate explanation, as Gauger argues, the problems for ID do not go away. In light of these new experimental results showing widespread function in random sequences, as well as the accumulating evidence that de novo genes are common, ID has far greater problems on its hands.
In the next and final post in this series, we’ll reflect on how an evolutionary creationist understanding of the origins of biological information can lead to wonder and worship.
If you’ve ever seen pictures of–or better yet, handled–staurolite, its appearance can be somewhat unsettling. This mineral has a propensity to form what are known as twinned crystals–two distinct crystal lattices that nonetheless share a common region in a particular shape. For those who know ancient Greek, the name staurolite gives the shape away: stauros translates to “cross”, and lithos to “rock”. It just so happens that this mineral tends to form twinned, perpendicular, cruciform crystals—in other words, crosses. See the image above for an example.
Even when one knows that staurolite is a naturally-formed mineral, when holding a symmetrical one it is difficult to not see it as a human artifact. Our minds intuitively “know” that symmetrical, right-angled structures don’t just happen by chance. This isn’t random, we infer—and we are correct. After all, humans frequently make crosses for all sorts of purposes, whether architectural, artistic, or religious. If we don’t understand the physical and chemical processes that formed what we are holding, we might logically conclude that the non-randomness that we detect is the result of human agency.
Design and information
In this series, we’ve explored a similar issue with respect to biological information. Intuitively, we recognize that biological information—gene sequences and the biochemical structures and functions they produce—are not random. They interact with one another, and, to echo the words of Intelligent Design proponent Michael Behe, they are “parts arranged for a purpose.” As such, Behe and others infer from this non-randomness that there is agency at work to produce the effect they observe.
As we’ve seen in this series, however, there is another way to look at biological information—that it is the result of what we would call “natural” processes. We’ve seen the evidence that new, functional biological information is readily found in even random DNA sequences—so easily found, even, that it is more probable to find a functional DNA sequence in a random pool than it is to roll a seven with a pair of dice. That is highly counterintuitive, to be sure, but it is what we observe. As such, there is good reason to infer that the functional information we observe in present-day organisms was formed through similar processes.
We’ve also examined the evidence that biological information may have had a chemical origin at the beginning of life on earth—and seen that, though this is an ongoing area of research and much remains unknown, there is good evidence that biological information did indeed arise from chemical interactions.
In C.S. Lewis’s famous book The Lion, the Witch, and the Wardrobe, Aslan speaks of a “deeper magic” of which the Witch is unaware, because as a created being, she cannot understand things which extend beyond the beginning of time itself. In this context, Lewis is using “magic” not in the sense of spells and incantations, but rather of the fundamental properties and ordered structure of Narnia woven into its very nature by the Emperor Over the Sea (i.e. the Narnian way of referring to God the Father, even as Aslan represents God the Son). In this way, Lewis envisions God creating and endowing the cosmos with its coherent properties in his abundant love and under his sovereignty.
Though Lewis was thinking primarily about resurrection and redemption here, I’ve often thought that his idea of a “deeper magic” resonates in the science-faith area as well. How are staurolite crystals formed? By direct human fashioning? No, we’ve learned that they are formed by a deeper magic than that—by processes that are humbling and awe-inspiring in their time and scope. The iron, silicon, and aluminum found in them, for example, was formed in stars, and scattered to the cosmos in supernovae before it became part of our planet. The crystal lattice of staurolite depends on the interactions between the atoms that make it up–interactions that rely on the deeper magic of physics and chemistry that I, as a biologist, can only vaguely appreciate. The formation of crystals in the earth’s crust depends on geological processes with enormous timescales.
When God makes staurolites, it takes a long time—but they are no less beautiful for that process and history. So too, in my mind, for the wonders we see in biology: they too depend on deeper magic, and it calls us to wonder and praise.