Creating Information Naturally


#Snowflakes, Chess, and DNA

Snowflakes are elegant examples of both simplicity and complexity in the natural world. Surely we all remember learning as children that no two snowflakes are the same. Every flake has a unique and complex pattern. But then as we moved from elementary to high school and learned the basics of chemistry, we discovered that snowflakes are also simple. In fact, at the molecular level, they’re identical: they are all frozen water. How can this be? How do identical water molecules form billions and billions of unique flakes? The answer lies in the interplay between natural laws and chance. The regularity of natural laws makes each snowflake a beautiful six-sided crystal. The random motion of the individual molecules in the air makes each flake unique.

I’m fascinated by natural systems which become more complex over time via the interplay of law and chance. And snowflakes aren’t the only example. A small seed, over decades, can grow into a full-grown tree that is far more complex than the seed from which it began. Newly formed barren volcanic islands can be colonized by lichen and algae, then by more complex plants, then by animals that can swim or fly, gradually forming a complex ecosystem over millions of years. And a single fertilized egg cell, in just nine months, can become a living, breathing, human baby.

More complex things take more information to describe. A single water molecule can be described with a small amount of information; describing an entire snowflake requires much more information. A single seed might required a lot of information to fully describe, but a full-grown tree would require vastly more. The DNA in a single fertilized egg cell, encoded on paper or in a computer file, is several billion bits (“gigabytes”) of information. The information required to completely describe an entire cell is still greater, and the information required to describe an entire baby is far, far greater still. How is that information created?

Some advocates of Intelligent Design theory argue that the ordinary mechanisms of evolution cannot significantly increase biological information—specifically the information of DNA and protein sequences. They argue that going from relatively simple living organisms, with small genomes and small numbers of proteins, to more complex organisms required God to act in ways beyond ordinary natural laws and random processes.

Evolutionary creationists, however, believe that God created the biological information in our DNA and protein sequences through the natural laws and random processes that he designed and sustains. In other words, God created biological information through evolutionary mechanisms in ways analogous to how God creates the information needed to describe each new snowflake, each new tree, each new ecosystem, and every new human being.

The theme of this series is biological information. Dennis Venema has already written several posts about biological evolution and genetic information. Here, I want to expand the conversation further  to include examples from some of my areas of expertise: mathematics, physics, computer science, and game theory.

There are many kinds of information. Natural laws and chance can create vast amounts of some kinds of information. Natural laws and chance can copy or convert some kinds of information into other kinds of information. So the natural laws and random processes of evolution which create the biological information of DNA and proteins are just part of a larger set of processes which God uses to create and govern this universe.

Some kinds of information have precise mathematical definitions. For example, Shannon information is related to the number of bits of information used for encoding and transmitting messages. Kolmogorov complexity is a measure of the size of an algorithm required to describe an object. The internet has many technical articles describing how those kinds of information are defined and used. I won’t get quite so technical here.

While there are many kinds of information, in these blogs I’ll focus on just four steps of information creation and transformation. I’ll avoid technical or mathematical definitions and stick with intuitive understandings.

  1. Combinatorics: A few simple pieces and a few simple rules can combine to create a vast space of possibilities.
  2. Chance: Random events in physical, biological, or computational system almost always increase the amount of information required to describe the system.
  3. Evolutionary adaptation: As an object or organism adapts to a complex environment, variation and selection can cause information about the environment to be duplicated in the object or organism.
  4. Co-option: Variation and selection can cause parts of an object or organism to gain new functions, often leading to greater complexity.

For the rest of this post, I’ll focus just on the first step: combinatorics. Plastic toy bricks provide a nice example of combinatorics. Let’s say you have a set of 500 bricks, with about a dozen different types of bricks. A description of each type of brick, and a list of all the ways that any two types can snap together, could be written down on about one page of information. However, the number of different ways you can combine those 500 bricks—what I call the combinatoric possibility space—is vast. You could play for a lifetime and never come close to putting together every possible combination of those 500 bricks. Most of the shapes would look like abstract sculptures. But tiny subsets of that possibility space look like toy houses or trees or airplanes or sailing ships. Those potentialities were built into the combinatoric possibility space as soon as the bricks were first designed.

The game of chess provides another example. The board has 64 squares. There are only 32 pieces and 12 different types of pieces. Yet the number of unique ways you could arrange those 32 pieces is approximately 1057. (That’s a 1 followed by 57 zeros. For comparison, there are estimated to be about 1080 atoms in the visible universe.)

In addition to the possibility space of all arrangements of those 32 pieces, the rules of chess—which can be written on a few pages of paper—create another vast possibility space, namely, the space of all possible games of chess. The rules of chess specify all legal moves you could make from one arrangement of pieces to other arrangements. Sometimes the rules are completely deterministic, such as when there is only one legal move. Often there are multiple legal moves, and the player selects one. A computer might be programmed to randomly select from among all legal moves, or perhaps it might select randomly but with probability weighted by how “good” each move is. Starting from the standard opening position of chess, a common estimate is that there are about 10120 different possible games.

Snowflakes are an example of possibility spaces. When God designed the physics of water molecules, he effectively designed the space of all possible ways they could combine into snowflakes. Every time it snows, the random motion of molecules in the air explores a tiny portion of that space.

DNA provides another wonderful example of possibility spaces. DNA molecules are strings of just four kinds of nucleobase molecules (typically labeled C, G, A, or T). But they can be combined in almost any order into long strings. The DNA in mammalian cells has about a billion nucleobases strung together. This means that there are about 10500,000,000 different possible ways to put together a DNA string that long. When God designed the DNA molecule, he also created the vast possibility space of creatures which could be generated by all those combinations.

The natural laws of biochemistry describe the mutations by which a cell can move from one location in the combinatoric possibility space of DNA to another. Living organisms over the long history of life on earth have only explored an extremely tiny portion of that enormous possibility space.

Particle physics provides yet another stunning example of combinatorics. Consider just three fundamental particles: protons and neutrons, and electrons.1 The properties of these three particles, and the mathematical formulas which model how they move and exert forces on each other, can be written on roughly a single sheet of paper. These three types of particles combine into roughly 100 different types of atoms. And those 100 types of atoms combine into a tremendous variety of molecules.

Look around your room and think about every different solid, liquid, and gas in the room. Then think about the thousands of different biological molecules in your body. Each has unique properties. Those molecules combine to form almost everything we see. Yet each is made from a different arrangement of just those three kinds of particles.

I believe that when God designed electrons and quarks and the laws which govern their interactions, God had in mind all the possible things which could be made by combining just those three particles in different ways: stars and mountains, oceans and amoebae, plants and people. Our universe thus far has only explored a tiny fraction of all possible combinations of just those three types of particles.

When we humans create plastic toy bricks or games like chess, where a few simple pieces and a few simple rules can combine into possibility spaces so extensive that we couldn’t explore it all in the lifetime of the universe, we are imitating one aspect of how God chose to create our universe.

#Chance

In the first post in this series, I described examples of systems, some natural and some human-made,1 where a few simple pieces, along with a few simple rules for how they interact, can create so many possible combinations that they could not all be explored in the lifetime of the universe. The number of bits of information required to describe their possibility spaces, and number of bits required to describe all possible pathways which connect one point in that space to another, is greater than the number of atoms in the visible universe. But the information required to describe merely possible combinations is somewhat abstract and theoretical. Let’s talk about how some of that information “gets real.”

Imagine programming many computers to play chess against each other continuously, recording every move of every game. Before long, the amount of information required to describe every game ever played exceeds the amount of information required to program the computers in the first place. How does that happen?

Each time a computer chooses one chess move out of a list of possible moves, real information is created which is recorded in the arrangements of real atoms, on paper or in magnetic memory. There are occasions in a game of chess when there is only one legal move. On those occasions, it could be argued that no new information is generated (even if the move is recorded in memory). But most of the time, there are multiple legal moves. If the computer randomly selects from all legal moves (or even from a shorter list of all moves which are more-or-less equally good), that is a contingent event. Each time a contingent event happens, real information is created. During each game, out of the huge space of all possible chess games, one real pathway is chosen, and information about those choices is recorded in real physical objects.

Something similar happens in natural systems like DNA. Imagine a population of bacteria all cloned from a single cell, all with identical genomes. Let that population live and reproduce for many generations. Any time a mutation occurs in one cell and that mutation gets passed on, the genetic diversity of the population increases. A portion of the population moves from one point in the space of all possible genomes to another point, and the amount of information required to describe the entire population increases. Mutations are contingent historical events. Each time such a random event occurs, real information about that event is stored in the DNA of the bacteria.

Random events can accumulate to turn simple, uniform environments into highly variable environments requiring a lot of information to describe. This is illustrated by a screen-saver computer program I wrote about 20 years ago. It was inspired by how atoms in solution move randomly and interact with each other. The program starts with a blank screen and it has rules which cause single-digit numbers to pop on and off the screen with various probabilities. Each time the screen updates, these atoms can move one step in any direction. Meanwhile other rules govern how neighboring atoms can bind together into molecules or how bound molecules can break apart. After several thousand updates, the screen is filled with many different atoms and molecules (see picture). Each time the program runs, the final arrangement is different and for any given run, the arrangement of atoms and molecules on the screen at the end stores some of the information of the random events that occurred while the program ran.

One of my favorite examples of how random events create this type of information is the astrophysical and geological history of the Earth. Shortly after the Big Bang, the universe was a fairly uniform place, a mixture of particles and energy almost at thermal equilibrium. One piece of the universe looked very much like every other piece, with only slight differences. Over the next 9 billion years, that changed dramatically. Consider the variety of things the universe contained by the time the early Earth had formed: all the atoms in the periodic table, galaxies, neutron stars, black holes, etc. as well as planets like ours which included water, atmosphere, land, and collections of small organic molecules. Even before life started on Earth, the universe as a whole and our planet in particular had become highly variable environment requiring huge amounts of information to describe. All that variation had been produced over 9 billion years as fundamental particles interacted with each other according to the natural law and random events that God designed.

Random events not only cause environments to become more variable over time, but also sometimes cause ever-more complex objects to self-assemble within those environments. Astrophysics and chemistry again provide examples. Under the right conditions, particles combine to form atoms. Atoms combine to form simple molecules. Simple molecules can combine to form more complex molecules. Each new assembly has unique properties unlike its component pieces.

In order for complex things to self-assemble out of simpler components, several things must be in place. First, there needs to be a steady input of orderly energy (such as sunlight). Second, something must cause the pieces to move about randomly and encounter each other in a variety of ways (such as thermal energy). Third, the pieces themselves must have the right properties so that, when they encounter each other in just the right way, with neither too much nor too little energy, they remain stuck together in new combinations.

My screen-saver computer program was designed this way. Notice the ring of atoms numbered one to eight near the center of the screen. That ring molecule has two properties which make it unlike any other molecules that the program can produce. First, the bonds are 100% stable and will never break apart. Second, it invariably rotates one notch clockwise per screen cycle. (All other molecules sometimes rotate clockwise, sometimes counter-clockwise, sometimes not at all.) But there are no “special rules” governing that particular molecule. Its two unique properties—its perfect stability and its reliable rotation—are emergent properties of the same rules that govern all other atoms and molecules. Whenever I run that screen-saver program, I have the option to assemble that molecule “by hand” at any point while it is running, but I don’t need to. Ring molecules self-assemble over time, typically within about 30,000 screen cycles.

This program provides a nice example of the importance of fine-tuning the fundamental laws. If I decrease all molecular bond strengths too much, smaller molecules tend to break apart before that ring molecule can self-organize. If I increase bond strengths too much, much large molecules take over the screen. Unless I have molecular bonds strengths tuned within a fairly narrow range, the probability of ring molecules self-organizing in the lifetime of the computer becomes vanishingly small.

Another man-made example of self-assembly is this set of plastic pieces with embedded magnets which, when put into a jar and shaken, self-assembles into a spherical construct. This sphere was inspired by how the protein coats of viruses self-assemble. For self-assembly to happen, the individual pieces must be crafted properly, with pieces of the right shapes and magnets neither too weak nor too strong, and the amount of shaking must be neither too small nor too great. This is yet another example of the importance of fine-tuning.

Under the right conditions, larger molecules can self-assemble out of smaller pieces.

What about first life? Under the conditions of the early Earth, could molecules self-assemble into ever more complex combinations leading all the way to groups of molecules which could self-replicate? That’s a very difficult question, and for now, scientists don’t know the answer. Research groups are working on that question using a variety of techniques. For now, I just want to point to one computer model which I found interesting.2 It’s a computer model of the evolution of an autocatalytic set of chemicals.3 They weren’t modeling real chemicals, but theoretical chemicals where each was described by how it up-regulates or down-regulates reactions between other pairs of chemicals. They allowed the chemicals in their model to react for a while, occasionally washing out chemicals which had already been mostly consumed by other chemicals and washing in new ones. Each time they ran the simulation, eventually, one sub-set of chemicals would become autocatalytic, increasing their own numbers by consuming other chemicals in the system. While this model doesn’t prove that abiogenesis happened this way on the early Earth, it does provide yet other model of complex systems with novel properties self-organizing out of simpler pieces.

While scientists do not yet know the probability of abiogenesis happening on the early earth, the point here is that information is no barrier. First, the possibility space of all possible combinations of fundamental particles is vastly greater than the information content of a living cell. Second, random events in the history of the universe turned some of that potential information into real information embodied in real molecules and real conditions on the pre-biotic (before-life) Earth. Third, the amount of information needed to describe the environment of the pre-biotic Earth as a whole was much greater than the amount of information needed to describe the first living organism. Fourth, on the early Earth there was a steady stream of orderly energy (sunlight) and a constant thermal jostling and mixing of the molecules. These are exactly the sorts of conditions which enable the creation of information needed to self-organize real, complex objects.

#Evolutionary Adaptation and Co-option

“Cog (1993-2004).” MIT Museum

The question before us is this: how can natural laws and random events, over time, assemble the kind of information content we see living organisms. In the first two posts of this series I described several systems where just a few simple pieces can combine into a vast space of possibilities, and where random events can cause those pieces to explore portions of that space, assembling more complex objects and a rich variety of environments. But that’s not the whole story. The earliest living organisms on Earth were much simpler than today’s organisms. Greater complexity implies greater information. To understand how that happened, we need to add two more chapters to the story: evolutionary adaptation and co-option.

Consider Cog, a robot developed at the Massachusetts Institute of Technology to help test theories about human learning. Cog learns about its environment by interacting with it. It has many interconnected computer processors working simultaneously that process sensory information, control body movements, and then coordinate sensory information with body motion so that Cog can learn to perform physical tasks.

One task which Cog can learn through repeated trials is pointing its arm at a distant object. During the first few trials, Cog flails its arm and points randomly. Then, error-correcting routines take over and, after repeated trials, Cog gets better and better at pointing. Now consider the end result of many trials. There are a many variables in Cog’s distributed memory which allow it to point successfully. These variables control the sequence, timing, and amplitude of various motions in Cog’s neck, shoulder, elbow and wrist joints. If you reset Cog’s memory to where it was before it started learning, and have Cog re-learn the task, it will re-learn the task in about the same amount of time, with the same success, but with a different final set of variables in its memory. The reason the final set of variables is different from trial to trial is because Cog’s learning starts out with random arm-flailing, which provides data for subsequent error-correction and improvement.

After Cog has learned a task, there is a great deal of information in its distributed memory. Out of all possible sets of variables in Cog’s memory, only a tiny subset of variables allows it to perform the specified task of pointing at distant objects.  (This is somewhat analogous to the fact that out of all possible DNA sequences, only a tiny subset of DNA sequences can produce a living organism.) Once Cog has learned to point, there is new information in Cog’s memory necessary for the task. Where did that information come from?

To help answer this question,  imagine a simple computer program designed to learn how to navigate mazes. This program reads an instruction string of zeros and ones; each pair of bits in the instruction string tells it to move left, right, up, or down the screen. The instruction string starts out as ten randomly chosen bits, enough to move five steps. The program enters the maze and simply follows the instruction string from the beginning. If the program hits a wall in the maze before it gets to the end of its instruction string, it stops following the instruction string and generates an error signal. If the program gets an error signal, it randomly flips one bit and tries again from the beginning of the maze.

Eventually, the program will hit upon an instruction string which it can follow to the end without getting an error signal. When this happens, the program increases the length of the instruction string by duplicating a random ten-bit piece of its instruction string and appending it to the end. Now whenever it generates an error signal, it randomly flips only one of the last ten bits of its instruction string. After repeated trials, it will once again hit upon an instruction string which it can follow to the end without error. And once again it lengthens its instruction string further. This continues until the program finally finds the exit of the maze. If the maze is large, the instruction string it discovered by trial-and-error can be much longer than the simple program which reads the instruction string. There will be a lot of new information in that instruction string telling the program how to navigate the maze. Where did that information come from?

Both Cog and the maze-navigating program illustrate how information about the environment can be duplicated into the instruction strings (or “genomes”) of a self-replicating and evolving system. Biological evolution does the same thing. The genomes of organisms contain information about how to survive and thrive in a particular environment. Often, there is redundancy in that information. The same amount of information can be encoded in many different ways. When a mutation occurs in one member of a population that leads to greater reproductive success, and that mutation spreads to the population, the organisms become better adapted to their environment, and their genomes contain still more information about how to survive and thrive in that environment.1

For the past few years, I’ve been working with collaborators and students to build another computer model of this. We call it Pykaryotes.2 The digital organisms in our computer model “live” and move in an environment with a distribution of several different kinds of food. Each organism has a sort of genome—a string of codons which tells it what food to gather when, when and what direction move, and what proteins to make. After a certain number of genome reading steps, an organism’s fitness is calculated based on how many food chemicals it has gathered, and its fitness determines the average number of offspring it produces.

During reproduction, our digital organisms might experience various types of mutations inspired by real-world biological mutations, including point-mutations, deletions, genome copying, and horizontal gene transfer. When we run the Pykaryotes program with these mutation rates set neither too high nor too low and with adequate rewards for fitness, the simple starting organisms almost always become more fit and more complex as the generations pass, and the information content of their genomes grows. When we run this program under other conditions—for example, when the mutation rate is too high or too low, or when there are only weak rewards for increased fitness—the simple starting organisms do not evolve to greater and greater fitness, and the information content of their genomes does not grow. The program was designed to behave this way in order to mimic real biological evolution.

Pykaryotes illustrates yet another way in which information and complexity can evolve. In our digital organisms, proteins which initially have one function can gain new functions (sometimes without losing their old functions) through co-option. Protein co-option could start with gene duplication, followed by mutation of one copy of the gene (while the other retains its original function) until the protein has changed enough that it begins interacting in new ways with other proteins in the cell. A second way for a protein to become co-opted is when mutations happen, not in the gene for the protein itself, but elsewhere in the genome, creating changes in other proteins in the cell, and causing new interactions. A third way for co-option to happen is not with changes in the genome at all, but with changes in the environment. When the environment changes, a protein which already performs one function can continue to do so while beginning also to perform a new function in the new environment.

Over time, our digital organisms evolve complexes of 2 to 5 bound proteins in which the complexes as a whole have food-gathering functions but each component protein has no independent  function. Our computer model allows us to study how the rate at which such protein complexes evolve depends on things like mutation rates, the strength of natural selection, and the frequency of changes in the environment. Whenever conditions are right, these digital organisms evolve ever larger functional protein complexes, each of which requires ever larger amounts of information to describe how the organisms gather each type of food. Through gene duplication, mutation, and co-option, the information content of their genomes grows as well.

Other articles on the BioLogos website describe real biological examples of co-option creating new biological information and increasing complexity (check out the further reading section for some examples). Cog the robot, the maze navigation program, and Pykaryotes are all examples of “evolutionary algorithms.” They are human-designed computational systems inspired by God-designed biological systems. As evolutionary systems adapt to their environment through a process of trial-and-error (a better term would be “trial-and-greater-success”), they accumulate more and more information about the environment, and encode that information inside themselves.

Conclusion

I’m fascinated by natural systems which become more complex over time via the interplay of law and chance. I believe that God designed the laws and random processes of the natural world so that, under certain conditions, physical and biological systems evolve greater complexity and create information naturally. We humans have been inspired by God’s handiwork and created many games, mathematical systems, and technologies which model some of those natural processes. I don’t claim to have proved that the complex biochemical machinery of modern cells evolved from simpler beginnings. I’m making the following more limited claim: information poses no barrier to the evolution of complexity.

And the next time you see snow falling, pause to be amazed by this fact: just the simple water molecule can combine with other copies of itself, through the interplay of law and chance, to form billions of billions of snowflakes—each one unique.


Notes & References



Table of Contents

So What Is BioLogos?

Well it all began with a scientist and a book. Francis Collins, the physician and geneticist who led the Human Genome Project, wrote the book, The Language of God. In it he describes his own journey from atheism to Christian faith, and the harmony between Christianity and science.

Today, BioLogos continues to carry out the vision of Collins, showing that you don’t have to choose between modern science and biblical faith.

I want to learn more

Loren Haarsma
About the Author

Loren Haarsma

Loren Haarsma earned a Ph.D. in physics from Harvard University and did five years of postdoctoral research in neuroscience in Boston and in Philadelphia. He began teaching physics at Calvin College in 1999. His current scientific research is studying the activity of ion channels in nerve cells and other cell types, and computer modeling of self-organized complexity in biology and in economics. He studies and writes on topics at the intersection of science and faith, and co-authored Origins: Christian Perspectives on Creation, Evolution, and Intelligent Design with his wife, Deborah.