This post is the third in a four-part series that has been adapted from Stephen Freeland's scholarly essay (available here) on the origin of genetic information.
The description of evolution given above applies once the world contains a genetic material that can influence its own rate of copying by reflecting the environment. In living systems, these remarkable properties are produced by the Central Dogma of molecular biology (see Box 1 below). Perhaps a stronger argument for Intelligent Design is that no natural process could create such a versatile system in the first place?
It is true that at present, evolutionary science does not have a clear, detailed and well-accepted explanation for how the Central Dogma of molecular biology emerged. But does that mean it is time to embrace Intelligent Design as a better approach? By analogy, current medical science has not found the cure for cancer. Taken in isolation, this sound-bite could lead to the misleading view that existing research directions, developed for decades, are best written off as a failure. This would miss an important context. Many aspects of cancer are now being treated with far greater effectiveness than ever before as a result of ongoing research. However, these cures are not robust (all-encompassing) enough to be summarized into the statement “we have found the cure for cancer.” This status is typical of big questions within science: failure to reach the sound-bite goal should not be mistaken for evidence that the research program has failed. Scientific progress is measured by the insights that research produces, and their implications for where we might usefully look next. These insights may even open up new awareness of just how much we do not understand, but characterizing the past few decades of cancer research as an exhaustive search that has ended in failure would be more than premature: it would be actively misleading. This final section of the article offers context to help the reader judge whether a similar situation holds for current research into natural processes that explain the origin of genetic information.
Let us start by making entirely clear what scientists are looking for. As the previous section explains, the challenge is not to find a natural process that can create enough information for a simple genetic system. The universe is replete with information capacity and syntax – from the positions of stars within our galaxy (and billions of others) to the arrangement of atoms in a single grain of sand. Within living systems, most of this information is ignored - so the question is not “where did the information come from” (unless we wish to talk cosmology – a very different subject) but rather “how does nature create systems that focus on some of this natural information?” Put another way, the challenge for understanding the origin of genetic systems is to find how natural processes can simplify a large amount of thermodynamic information into a syntax that displays only the disciplined chemical semantics of a self-replicator.
The exact details of life’s genetic information system came into focus during the middle of the 20th century.1 In 1953 Watson and Crick published the structure of DNA,2 revealing the innate capacity of this molecule to replicate and evolve indefinitely. Thirteen years later, a consortium of scientists published the details of the genetic code by which the information carried by DNA is translated into specific protein sequences.3 The system was so fundamental to understanding life, yet so simple and easy to explain that it has become known as theCentral Dogma of molecular biology (Box 1). However, it was puzzling from an evolutionary perspective. Protein catalysts supervise the construction of individual nucleotides (the building-blocks for making DNA and RNA). Other proteins link these nucleotides into DNA or RNA sequences, depending on their type (deoxyribonucleotides into DNA, and ribonucleotides into RNA). Proteins can perform these roles because each one has just the right chemical properties to catalyze a specific chemical reaction (such as linking a molecule of the nucleotide “A” to T, G or C to start building a genetic message).4 Each protein is a long chain of amino acids (typically several hundred) that have been chemically linked together. The function and shape of a protein emerge spontaneously according to the sequence of these amino acids – just as the meaning of a word is carried (for us) by a sequence of letters drawn from the English alphabet.5 The only way to reliably build the right sequence(s) of amino acids to make the proteins of metabolism is to follow genetic instructions, one code-word (codon) at a time. In other words, for more than three thousand million years, everything living has needed proteins to make genetic information – and needed genetic information to specify how these proteins are to be made.
At the time of discovery, this system looked like an example of what proponents of Intelligent Design might call irreducibly complexity. In other words, a complex system that cannot evolve from simpler precursors, because any simplification would lose the entire functional value of the system. This perception of an un-evolvable code was further enhanced by the discovery that the same exact genetic code is at work in organisms as different as human beings and E coli bacteria (Refer back to Figure 1: this is about as genetically different as living organisms can be!). Scientists of the time came to think that one genetic code was universal for all living systems on our planet. This led Francis Crick to propose that the genetic code is a “Frozen accident” of evolution,6 universal across life precisely because once it had formed (by some unknown event), it was so fundamental to all biochemistry that it could never change again. Specifically, he pointed out that any change to the rules of genetic coding would be equivalent to a simultaneous mutation in every single gene in the organism (Box 1).7 While evolutionary theory requires that occasional small mutations produce a better fit to the environment, the simultaneous mutation of thousands of genes seems extreme even by the standards of macro-mutationism. However, subsequent science has developed at least three major lines of research that undermine the concept of a frozen accident (and irreducible complexity) for genetic coding.8
First, it has been discovered that the genetic code is not universal. Around a dozen or so minor variations exist.9 These variations are mostly codes in which one or more genetic codons have altered their amino acid “meanings.” Some involve a more significant change – the addition of a 21st or 22nd amino acid.10 Everything indicates that these genetic codes evolved from the standard genetic code during the past few hundred million years, and continues to evolve today. Arguments for the evolvability of the code are strengthened by the finding that amino acids are assigned to genetic code-words non-randomly. In particular, codons are assigned to amino acids in such a pattern that common mutations produce minor variations as proteins are decoded. A growing body of evidence connects this feature of the code to the idea that considerable evolution by natural selection had gone into shaping this system.11 Everything suggests that the genetic code is evolved and evolvable after all.12
The second major insight into the origins of genetic coding is that multiple, independent lines of evidence suggest the standard amino acid alphabet of 20 building-blocks grew from a smaller earlier alphabet corresponding to an earlier stage in genetic code evolution. Many variations have been proposed.13 Most derive their views by considering only one or two types of evidence; sophisticated calculations of the amino acid sequences of truly ancient proteins, the repertoire of amino acids found in meteorites; simulations of an early, pre-biological planet Earth and so on. What is interesting is an un-looked for match between the broad findings of these different approaches. In particular, different approaches end up dividing the 20 amino acids of modern organisms into 10 that were around in the earliest systems, and 10 that arrived later, as by-products of early biological evolution. The members of each group are remarkably consistent,14 hinting directly at the process by which the genetic code evolved, growing more complex over time from simpler beginnings. Recent findings are also starting to make sense of why natural selection created this particular alphabet of building blocks.15
The third line of insight takes us backwards to the possible origins of genetic coding. Some scientists have used the SELEX approach described in a companion paper by Watts to define mini-sequences of RNA that specifically bind to a particular amino acid.16 Although results have been patchy, some amino acids seem to associate with surprising choosiness to the code-words assigned to them in the standard genetic code. This association suggests that the earliest steps in genetic coding may have been nothing more than simple physical affinities between two types of chemical.
Between them, these insights represent significant progress from the impossibly self-referential system viewed by Crick and those around him just 50 years ago. This half-century of research indicates that the standard genetic code at work in modern cells may be a product of substantial evolution that had taken place by around 3 billion years ago. But perhaps the most interesting progress is that few scientists still regard the emergence of life’s Central Dogma as the origin for genetic information.