In the last post in this series, we saw that Stephen Meyer, an advocate for Intelligent Design (ID), mistakenly claimed in his book Darwin’s Doubt that de novo gene origination was an unknown mechanism—and thus not a valid explanation for how new biological information arises over the course of evolution. As we have seen, however, protein-coding genes can and do come into being de novo—“from new”— but they do so from previously existing DNA sequences. So, while their protein products might be new, their DNA sequences are modified descendants of DNA sequences that did not code for proteins.
In his effort to show that evolution cannot produce new biological information, Meyer also claims that even parts of functional genes cannot be constructed through mutation and selection. These protein parts are known as “protein folds”—amino acids in a sequence that fold up into a stable structure. Though proteins are made up of many such folds joined together, there is good evidence that evolution is capable of generating new folds in a piecemeal fashion—folds that can then be incorporated into other proteins to give them new functions, or even swapped between genes via chromosome structure changes that move stretches of DNA around. Meyer, understandably, disputes this, since it would be a mechanism for evolution to produce new information with new functions.
Before we delve into Meyer’s argument, let’s look at how proteins are made—and how they fold up into functional shapes—in more detail.
Protein chemistry and folding
Proteins, as we have seen, are made up of amino acids linked together—and there are 20 distinct amino acids found in biological proteins. Though these amino acids are different in their chemical properties—some are attracted to water, some avoid it; some have a charge, some are neutral, and so on—they also have a common structure that allows them to be joined together in a chain. The general structure of an amino acid is shown below. All 20 amino acids have this core structure in common—the only differences between them are in the “R” group, which can vary widely. The “R” is a chemical shorthand for what can be as simple as a single hydrogen (H) atom, or as complex as carbon ring structures. Representing these varied structures as an “R” allows us to see the common chemical core of all 20 amino acids (Figure 1):
Figure 1: All amino acids have a common structure, as well as one chemical group that is distinct. This distinct group can be abbreviated as an “R” to emphasize the chemistry common to all amino acids. Image credit: Wikipedia.
The physical joining of amino acids to make proteins depends on the core structure —it does not involve the “R” groups. A “peptide bond” is formed between two amino acids when an “-OH” group (i.e. an oxygen and hydrogen atom) on one amino acid and a hydrogen atom (“H”) on the other react to form water, leaving a carbon atom (“C”) and a nitrogen atom (“N”) from the two amino acids bonded together (Figure 2):
Figure 2: When amino acids are joined together into proteins, the various R groups are not involved. Image credit: Wikipedia.
The key thing to notice for the purposes of this discussion is this: amino acids have a common chemistry that is used to link them together. This common chemical structure also produces a second effect: it also allows proteins to fold up into two specific shapes that are repeated over and over again. The first is a corkscrew-like shape called an alpha helix, and the second is a flat, pleated structure called a beta sheet. These structures are stabilized by chemical interactions between atoms in the common core structures—not the R groups. This means that there are many, many different amino acid sequences that can form alpha helices and beta sheets. If specific R groups were required to form helices or sheets, then only a few amino acid sequences would work for these structures. Since these structures form using the atoms that amino acids have in common, many sequences work equally well.
Protein folds—the stable three dimensional shapes within proteins—often have alpha helices and beta sheets embedded within them, giving them structure. Let’s look at a few examples of real proteins to get a better idea of how this works. Hemoglobin is a protein that many people are somewhat familiar with, since it is the protein we use to transport oxygen in our bloodstream. Hemoglobin is made up of four separate proteins that bind together to form a complex, and each of the four protein units is filled with alpha helices (Figure 3):
Figure 3: Hemoglobin is a protein complex made up of four proteins that associate with each other (two shown in blue, and two shown in red). These four proteins are folded into stable shapes that rely greatly on numerous alpha helices (represented as corkscrew-like shapes in the diagram). Image credit: Wikipedia.
A second example—and one that depends heavily on beta sheets, is green fluorescent protein (GFP)—a natural protein found in jellyfish that has become a favorite tool of cell biologists because it glows green when exposed to ultraviolet light (Figure 4):
Figure 4: Green fluorescent protein (GFP). GFP is a “beta barrel” where numerous beta sheets (represented by flat arrowheads in the diagram) are arranged in a stable barrel-like structure. Loops of amino acids connect each beta sheet. Short alpha helices are also present. Image credit: Wikipedia.
Proteins, then, are stable structures that are largely built on alpha helices and beta sheets—which in turn depend on the chemical structure common to all amino acids.
Meyer and protein folds
For Meyer, new protein folds are beyond the reach of evolution, he argues, because they are too rare for chance events to produce:
… experiments establishing the extreme rarity of protein folds in sequence space also show why random changes to existing genes inevitably efface or destroy function before they generate fundamentally new folds or functions … If only one out of every 1077 of the alternate sequences are functional, an evolving gene will inevitably wander down an evolutionary dead-end long before it can ever become a gene capable of producing a new protein fold. The extreme rarity of protein folds also entails their functional isolation from each other in sequence space. (Darwin’s Doubt, page 207)
Of course, we have already seen that Meyer is mistaken here as well. If de novo protein-coding genes such as nylonase can come into being from scratch, as it were, then it is demonstrably the case that new protein folds can be formed by evolutionary mechanisms without difficulty. Nylonase is filled with stable protein folds, as you would expect since it is a functional enzyme. Moreover, its folds have now been extensively characterized at the molecular level, and, not surprisingly, they contain alpha helices and beta sheets. So, if Meyer had understood de novo gene formation—as we have seen, he mistakenly thought it was an unexplained process—he would have known that new protein folds could indeed be easily developed by evolutionary processes.
While nylonase is a dramatic example of an entire gene, with numerous folds, being produced in one fell swoop, there are other less dramatic examples that are nonetheless informative about how evolution can produce new folds and with them new functions. One such example recently came to light, and it is of profound interest because it sheds light on one part of how we became human.
New biological information and hominin evolution
One of the defining features of humans, when compared to our living great ape relatives, is the size of our brains. In general, hominins—i.e. humans, and species more closely related to humans than to chimpanzees—have larger brains relative to body size than do our closest living relatives (chimpanzees, gorillas, and orangutans). The hominin lineage, since it parted ways with the chimpanzee lineage 4-6 million years ago, has thus increased its cranial capacity over time, and we see this effect especially in species such as Homo erectus, Homo neanderthalensis, and our own species, Homo sapiens. Scientists are quite interested in the genetic changes that allowed our lineage to develop larger brains over time. While many of these changes will likely be tweaks to previously existing genes—i.e. genes that were present in the last common ancestral population of humans and chimpanzees—it is possible that some new genes have evolved on the hominin lineage that contribute to our larger brains. One research group recently set out to test this hypothesis, and the results are an excellent example for our purposes: a new gene (well, sort of new), with a new protein fold, and with a new function. Let’s take a look.
In order to look for recently-evolved genes that contribute to brain development, this group first looked for genes that are highly expressed in certain developing brain tissues—specifically, in cells that give rise to nerve cells (neurons) in the developing neocortex—the surface of the brain that has the bumps and grooves that you’re familiar with if you’ve ever seen a model of a human brain, or looked at an anatomy textbook. Those bumps and grooves—called gyri (singular, gyrus) and sulci (singular, sulcus)—increase the surface area of our brains inside our skulls. As humans develop in the womb, our developing brains become progressively folded: a process called gyrification (Figure 5). More surface area means more neural processing power, and it also goes hand in hand with more brain volume: bigger skulls gives more room for bigger brains—and the more gyri and sulci we can pack in there, the more surface area we have for increased mental processing.
Figure 5: during human embryological development, our brains become progressively folded—a process called gyrification. The result is a consistent pattern of bumps (gyri) and grooves (sulci) on the adult neocortex. Image credit: Wikipedia.
So, cells in the developing neocortex are a good place to look for new genes that influence our brain surface area and volume. With a set of such genes identified in humans, the researchers then compared what they had found to the mouse genome. Mice and humans share a common ancestral population at about 125 million years ago, and the researchers were interested in finding genes that we have, but that mice do not. These genes, then, would have arisen (by duplication, de novo origination, et cetera) over that 125 million year span since our lineages have been separate. In this way, they hoped to find new genes that contributed to brain development in the lineage leading to humans.
One human gene that they found (named “ARHGAP11B”) seemed a likely candidate for promoting gyrification. It is not only not found in mice, it is not found in chimpanzees, either. This indicates that it came to be after the lineages leading to humans and chimpanzees separated: in other words, it is a hominin-specific gene. They then looked in the Neanderthal and Denisovan genomes for this gene, and found it was present. (Though these hominins are extinct, we have recovered and sequenced their genomes). These results give a range of possible times for the duplication event—after the human and chimpanzee lineages diverge, but before the human lineage diverges from the population leading to Neanderthals and Denisovans (Figure 6).
Figure 6: the presence of the new gene, ARHGAP11B, in humans, Neanderthals, and Denisovans (but not in chimpanzees) gives a possible range of times for its origin (red lines). If DNA is recovered from Homo erectus in the future, the range could be narrowed further—but at present is it not known if this mutation occurred before or after its lineage separates from the lineage leading to humans.
Upon closer examination, they found that this gene, despite its novelty, did not come out of thin air. It’s a partial duplication of a neighboring, ancestral gene that is still present in mice and humans. A segment of chromosome was duplicated, and the duplicated region spanned a part of this gene, producing a shortened copy next door. The shortened copy was at first just that—a shortened copy of the original gene. Later, however, a second mutation event occurred—one that created a new site for RNA splicing of the gene when it was transcribed. This new splicing site bypassed some of the shortened copy’s amino acid sequence, and instead spliced to a sequence that was not intended to code for amino acids at all. The result is that the shortened copy suddenly had a short section of brand-new amino acids tacked on to the end of it. These amino acids are unlike any other protein in any other organism—they are a brand new amino acid sequence. So, we have an example of a duplicated, shortened gene picking up a new region de novo—from previously existing, but previously non-coding, DNA.
As the researchers kept investigating, they came to a surprising conclusion: the new protein region gave the duplicate gene a new function. Using recombinant DNA techniques, they recreated the original shortened duplicate gene and compared it to the shortened gene after the new amino acids were added. The duplicate gene with the new amino acids increased the number of neurons in the developing neocortex. They even engineered it into the mouse genome, and saw that the mouse brain—which does not have gyri and sulci—began to fold as a result. The original duplicate gene, prior to the new amino acids being added, had no such effect. The new protein region was the source of this new function—and likely was a significant factor in the expansion of the hominin brain on the lineage leading to Neanderthals, Denisovans, and humans. A new protein region, de novo, carrying with it significant new functional information—that arose through readily accessible mutational steps. While we do not yet know the precise shape of this new, composite protein, we know that it is likely to be stably folded, given that it is functional. The other formal possibility is that this domain functions as a structure that can take on wide number of possible shapes depending on its context and what it interacts with. Such domains thus do not have a particular, stable structure, and are said to be “intrinsically disordered”*. Of course, the discovery that some protein domains (or even entire proteins) do not need a stable fold to take on functional shapes does nothing to help the ID cause.
In the next post in this series, we’ll revisit the chemistry of protein structure, and see why the features we find there further suggest that ID advocates have greatly overestimated the difficulty of evolving new protein folds and functions.