How organisms reproduce “after their own kind” (to borrow the language from Genesis) is a longstanding question in biology. A closely related question arises from the observation that within a “kind,” not all individuals are the same—variation exists within populations of the same species. For many years, the mechanism that could explain both the observedconstancy of a species (faithful reproduction of the form of an organism) and variation (not all members of a species are identical) remained a mystery. In order to shed some light on these important issues for evolutionary biology, we need to take some time to explore the “nuts and bolts” of how two important biological molecules work, and how they relate to one another: deoxyribonucleic acid (DNA) and proteins.
Molecular Genetics 101: Proteins and DNA
You might be surprised to learn that early work in exploring the molecular basis for genetics favored proteins as the hereditary molecule instead of DNA. It was suspected that whatever was acting as a hereditary molecule would be large and complex, and proteins were both. Proteins can be very long, since they are a polymer of smaller, repeating components (monomers). We can use children’s interlocking bricks to illustrate what we mean. For bricks, each individual piece is a monomer, and when they’re snapped together, they form a polymer:
Proteins are built pretty much in the same way. For proteins, the monomers are a group of compounds called amino acids (each amino acid is one monomer). Like the bricks in our analogy, they have features in common that allow them to be “snapped together” into a long chain. They also have significant differences, analogous to the different colors in the diagram: some amino acids are hydrophobic (i.e. they are repelled by water), others are hydrophilic (i.e. attracted to water). Some are large and bulky, others are comparatively small, and so on. Unlike the rigid bricks in our analogy, proteins are marvelously flexible, and fold up into a three-dimensional shape, as directed by the properties of the monomers.
There are 20 different amino acids that are used to make proteins, and they can be combined in any sequence in order to produce a protein with specific properties—properties that arise from the combination and specific order of amino acids, and the final shape they give to the protein. This diversity in monomers means that there are many, many different possibilities for protein sequences (and thus shapes, and functions)—even a polymer only two monomers in length has 400 possible sequences (i.e. 202, or 20×20), and proteins can be thousands of amino acids long. It was this possibility for large-scale complexity that suggested that proteins might have enough “storage capacity” to hold hereditary information and pass it on to the next generation.
Beginning in the late 1920s, however, research began to point away from proteins and towards DNA as the hereditary molecule. DNA, like proteins, is a polymer formed from a set of monomers (in this case, nucleic acids). In contrast to the 20 monomers found in proteins, DNA has only four monomers: compounds abbreviated as A, C, G and T. It was for this reason that researchers were initially skeptical that such a “simple” polymer could act as a source of hereditary information.
Despite this skepticism, evidence continued to mount that DNA was in fact the physical basis for hereditary information. Once this evidence convinced the majority of scientists, the race was on to understand exactly how DNA accomplished this remarkable task. Soon, it became clear that understanding the structure of DNA was crucial to understanding its function, and several research groups famously competed to be the first to decipher it.
Determining the structure of DNA did indeed shed light on its function. Though it has only four monomers, the structure of DNA revealed how it can easily replicate and pass information on: not only is DNA a long polymer, it is a polymer that can specify its own replication through interactions between its monomers. Perhaps a picture would help explain. Imagine bricks that now have “partners” they are attracted to. We’ll represent that attraction, which is a type of chemical bond called a hydrogen bond, with a black dot. The “A” and “T” monomers are attracted with two hydrogen bonds, and the “C” and “G” monomers with three:
These “attraction pairings” between monomers are important: they allow one DNA polymer to act as a template for a second, “complimentary” DNA polymer. Imagine a DNA sequence as follows:
As the second DNA polymer is made, monomers are selected, one at a time, to match their “partners” in the first polymer:
These two polymers are held together by the alignment of many hydrogen bonds, and you are likely familiar with them as the “two strands” of the DNA double helix:
While this more realistic model of DNA shows the precise details of its molecular structure, the important features are summarized by our simple “toy brick” model. DNA is a pair of long polymers that can be separated and used to make new copies that are faithful to the original.
While these features of DNA readily explain how it is faithfully copied, recall that we also need to explain variation. Variation, in the most basic terms, means there is sometimes imperfection in the copying process. If DNA is indeed the hereditary molecule, and if DNA copying was 100% accurate, then variation would never arise, and all offspring would be genetically identical to their parents. Without variation, recombination would have no effect (since there would be no variation to mix into new combinations).
There are many ways that variation can enter during the DNA copying process, and in a future post we will examine several of them. One way that we will consider now is simple “mispairing” of monomers during replication. At a certain (very low) frequency, inappropriate monomers are paired together. The arrow in the figure below shows one such mismatched pair, where a red monomer (G) on the bottom strand was incorrectly paired with a yellow monomer (T) when the top strand was made. When this set is replicated, both the top and bottom strands are copied, but now the correct partners for each monomer are found. The result is two different outcomes: one copy now has the original, correct C:G pair (on the left), and the other has a new variant, with an A:T pair (on the right). This change will be faithfully copied from here on, since later copies don’t “know” what the original sequence was. The result is a new variant in the population.
Taken together, the properties of DNA match what we observe in nature: faithful reproduction of form, but not perfect reproduction of form. At its base, constancy and heritable variation in biological populations trace back to how DNA functions.
What about proteins?
While the properties of DNA make it a great hereditary molecule (that nonetheless allows for variation to arise), DNA itself is not capable of doing the day-to-day functions that organisms need (enzyme functions, structural functions, and so on). For these functions, the vast structural diversity of proteins is required. In the next post in this series, we’ll discuss how the hereditary information in DNA is transferred to protein structure and function, and how variation in DNA can cause variation at the protein level.
Previously, we discussed how DNA replication is readily facilitated by its structure, since one half of the DNA double helix can serve as a template for making the other half. We also discussed how DNA, though well-suited for its hereditary role, is not at all suited to performing cellular functions—but that proteins fill these roles. With these details covered, we’re now ready to discuss how the hereditary information in DNA is converted to the functional diversity that we see in proteins—and how variation plays a part in this process. The first step in this discussion requires us to look into how chromosomes and genes work.
Molecular Genetics 102: Chromosomes and Genes
Humans have 46 chromosomes in each of their cells, and they come in pairs. We receive one of each pair as a set of 23 chromosomes from each parent: eggs contain 22 non-sex chromosomes plus an X chromosome, and sperm contain 22 non-sex chromosomes plus either an X or a Y chromosome. Each chromosome is one long DNA double helix, with millions of DNA base pairs. Our largest chromosomes have about 250 million base pairs, and our smallest about 50 million. Taken together, the human genome has about 3 billion DNA base pairs in each set of 23 chromosomes, or a total of about 6 billion if you count both sets.
Distributed on these 23 chromosome pairs are genes—the units of biological function encoded within our DNA. What exactly constitutes a “gene,” like all good concepts in biology, is “fuzzy,” but for our purposes we will define a gene as a sequence of chromosome DNA base pairs that are used to make a functional, non-DNA product. Humans have about 20,000 genes, and they can be quite spread out on chromosomes, with a lot of non-gene DNA in between them. If we represent a chromosome as a solid black line (as is common in many genetics textbooks), we can “zoom in” to see the features of one of its many genes. In this case, this is a gene that makes a protein product:
First off, we can see that the parts of the gene that are used to specify the protein amino acid sequence (the blue boxes) are only one part of the whole. Other sequences (such as those represented by the light blue lines and the red boxes) are sequences that direct certain cell types to make this protein, and how much of it to make. All of the sequences represented as boxes are made into what is called “messenger RNA”, or “mRNA”—sort of a single-stranded version of DNA—that is only as long as the gene sequence, and often splices out sequences that intersperse the sections that code for the protein structure (so-called “introns”, which can be seen in the above figure). This mRNA “working copy” of the gene is then used to direct the synthesis of the protein through a process called translation.
If this all seems a little complex, don’t worry—for our purposes here, it’s enough to recognize that genes are (a) a small section of a much longer DNA molecule (i.e. a chromosome), (b) have some sequences that determine the sequence of the protein that they encode (i.e. the order of its amino acids), and (c) other regulatory sequences that are not part of the protein code itself, but function instead as signals to tell cells when and where the protein should be made, or “expressed.”
With these details in mind, now consider how variation at the DNA level can affect chromosome structure. As we saw yesterday, when chromosomes are copied, DNA copying errors may occur. Not surprisingly, many types of mutation events can also impact the function of genes, and ultimately the characteristics of the organism:
Single base pair mutations: mispairing of nucleic acids can lead to chromosome copies that differ from the original by one base pair (as we saw yesterday). These so-called “point mutations” can occur inside genes (in either regulatory DNA, or protein-coding DNA) or in the sequences between genes. Single base pair changes in protein coding DNA may have no effect on the protein at all (since there are often different DNA sequences that produce the same sequence of amino acids, a feature known as “redundancy” of the genetic code). Other changes may alter the amino acid sequence by substituting one amino acid for another, but still have no effect on the function of the protein (since many protein functions can be accomplished by slightly different protein sequences). Other changes might reduce or even remove protein function. Still other changes might improve protein function—give it better enzymatic activity, for example.
Changes in regulatory DNA are also possible, and the effects of these changes can similarly be neutral, harmful or beneficial. What is interesting about regulatory DNA is that small changes can have quite large effects on where and when a protein is made—and changes that alter key genes that function early in development can have significant downstream affects on the organism as a whole. We’ll examine this in some detail in future posts in this series.
Deletion events: sometimes, stretches of DNA can be lost during chromosome replication due to breakage and rejoining events. Sometimes deletions affect only a few base pairs, but in some cases they can span thousands of base pairs. Parts of genes, or even entire genes, can be lost, and genes flanking the deletion are brought closer together. As we have seen for point mutations, deletions can have no effect, a detrimental effect, or even a beneficial effect depending on the specific event. For example, sometimes deletions remove regulatory sequences that shut down gene expression in certain cells. Removing this sequence allows the gene to be expressed where it was not expressed before—which again could be neutral, harmful or beneficial, depending on the circumstances.
Duplication events: this is the opposite of a deletion, where a portion of a chromosome’s sequence is doubled and ends up side-by-side. As for deletions, duplications can be small, or thousands of base pairs long, spanning numerous genes—and similarly be neutral, harmful or beneficial.
One common mechanism that produces duplications and deletions simultaneously occurs during recombination in the cells that lead to eggs or sperm. You might recall that “crossing over” is the term used to describe the physical breakage and rejoining of chromosomes to “mix and match” sequences between chromosome pairs during the cell divisions that lead to gametes (i.e. meiosis). Normally, chromosomes pair up for this exchange by lining up their (nearly identical) sequences, followed by precise breakage and rejoining:
What can happen, at a low frequency, is that chromosome pairs don’t align their sequences correctly. The alignment is based on the same sequences on each chromosome finding each other and binding to each other. Mistakes can happen because of repetitive sequences between genes—sequences that “trick” the chromosomes into thinking they’ve found their correct sequence alignment, when in fact there are two loops of unpaired sequence, one on each chromosome. If a crossover occurs between these loops, the result is one chromosome with a duplication, and the other with a deletion:
Of course, this list of mutation types is not exhaustive (for example, we have seen how autonomous, parasitic DNA elements called transposons can insert into chromosomes, disrupting functions, or contributing to new ones).
Summing up: constancy and change
Taken together, these mechanisms introduce variation into populations, and since that variation is in DNA, the variation is heritable. Variation at the chromosome level may influence the function of genes, and ultimately traits at the level of the organism. Changes at the DNA level that do cause meaningful variation at the organismal level are available for natural selection to act on—and we have already seen certain examples of selected mutations, such as the duplication of amylase genes in humans and in dogs. Other mutations, of course, are selected against, and may be removed from populations over time. The properties of DNA as both an agent of constancy and heritable change mean that populations are not entirely genetically stable: they can change over time, though the features of DNA that make it a largely accurate transmitter of information ensure that those changes will likely be subtle ones at the level of the organism.
As we will see in the next post in this series, this genetic instability can put separate populations of the same species on different trajectories, and allow differences to accrue that ultimately lead to new species forming.
Previous in Series
God's Word. God's World. Delivered to your inbox.
BioLogos shows the church and the world the harmony between science and biblical faith. Get resources, updates, and more.