Is There “Junk” in Your Genome? Part 2
One of the challenges for discussing evolution within evangelical Christian circles is that there is widespread confusion about how evolution actually works. In this (intermittent) series, I discuss aspects of evolution that are commonly misunderstood in the Christian community. In this second of several posts on “junk DNA”, we explore how small, autonomous DNA sequences called transposons have shaped mammalian genomes for worse, and for better.
As we saw in the last post, only a small fraction of the human genome appears to be subject to selection (on the order of 5-6%). The rest appears free to mutate freely without consequence to mammalian biology, and as such constitutes good evidence that it performs no particular function. An additional line of evidence in favor of non functionality in the human genome is the observation that a large fraction of our genetic material is made up of what are known as mobile genetic elements, or “transposons.” These little snippets of DNA are well known and well studied in many organisms, including humans. So, what are they, and what are they up to?
Along for the ride, but looking out for number 1
Non-biologists are usually somewhat taken aback when they learn about transposons. Transposons are small segments of DNA inserted into in the genomes of many organisms that are little worlds unto themselves: they have a few genes that serve only to copy themselves and move themselves to new locations in a genome. That’s it! On the scale of biodiversity, transposons are less life-like even than viruses. They are the perfect parasites: using their host to provide resources so they can replicate themselves, and with a “lifestyle” so simple that replication is essentially its only feature. Their origins, like the origins of viruses, is somewhat of a mystery.
Despite their somewhat mysterious nature, transposon sequences make up a staggering 45% or more of our genome. That’s about 1.4 billion DNA base pairs of our genetic material that is recognizable as functional transposons or their mutated, fragmentary remains. Not surprisingly, nearly all transposon sequences in the human genome are not under selection – they are free to accumulate mutations. These mutations have no effect on us since they do not alter any function we require.
Rags to riches: converting transposons to functional sequences
Despite their parasitic nature, sometimes the host species can exploit transposons as a source of genetic novelty. The ability of transposons to copy and spread themselves around in genomes raises the intriguing possibility that they can acquire a function if they land in the right chromosomal area. While it is difficult (though not impossible) for a transposon to acquire a function as gene coding sequence (i.e. becoming a host protein product), it is comparatively easy for a transposon to pick up a function as a regulatory sequence: a segment of DNA that directs when and where a certain host gene product should be made. Transposons contain regulatory sequences for their own genes already, and these sequences can potentially interact with regulatory sequences in the host genome.
Perhaps a review of gene structure and function would be helpful at this point. Genes are portions of the long DNA sequences that make up chromosomes (each chromosome is one very long DNA molecule). As we have seen above, a good proportion of these sequences are either transposons or the defective fragments of transposons, as well as other DNA that is not under selection and is free to mutate. Interspersed in this sea of non-selected sequences are genes: segments of chromosomes that code for protein products that carry out functions within the cell: enzymatic functions, structural functions, and so on. These sequences stand out because they are subject to selection, and thus do not change at the same rate as sequences that are free to mutate (as we discussed previously).
Genes have a typical structure (obviously simplified here somewhat). First off, there is the actual DNA sequence that specifies the protein product sequence (the so-called “coding sequence”, shown in blue). This sequence is usually broken up into segments in mammalian genes, and these sequences are spliced together when the DNA sequence of the gene is transcribed into a “working copy” called mRNA – a short duplicate of the code that can be used by the cell’s machinery to actually build the specified protein.
In addition to the actual coding sequences, other sequences are needed to tell the cell when and where certain genes should be transcribed into mRNA. Every cell in an organism has the same genes in their chromosomes, but not all are transcribed. Using different genes in different combinations is what makes cells take on distinct roles – for example, cells in your small intestine need different genes (for absorption of nutrients) than do cells of the immune system (for fighting off pathogens). Regulatory sequences make sure any given cell type has the right genes transcribed and made into protein products. Some of these sequences are part of the mRNA transcript (shown in red), and others are not transcribed but only part of the chromosomal DNA sequence (such as the “promoter” region that directs the enzymes responsible for making the mRNA transcript (shown in blue).
So, what happens when a transposon inserts into the regulatory sequence of a gene? In many cases, this mutation (the insertion event) will cause a problem (perhaps the gene is no longer transcribed in the right way, for example). In some cases, however, the gene can tolerate such an insertion. Regulatory DNA is more able to accept changes than is coding sequence DNA, so it is quite possible that an insertion may not harm the function of a gene.
In some cases, sequences from the transposon can participate in the regulation of the neighboring gene. If these changes are beneficial, as they sometimes are, then the transposon sequences involved in regulation come under selection. Some parts of the transposon mutate away beyond recognition, and the useful bits remain since they, now being under selection, are not (as) free to mutate. The end result is a gene that has co-opted a fortuitous event (a transposon insertion) and, through mutation and selection, honed it to serve a new function (altered regulation of its product). This is an example of exaptation, the conversion of one function to another through mutation and subsequent selection. In this case the old function (a “self-serving” transposon) has had a portion of its sequence exapted to become part of the host regulatory DNA.
Recent work comparing 29 different mammals has shown there are about 280,000 examples of exapted transposon fragments in mammalian genomes. Despite this large number, the absolute fraction of human DNA that falls into this category is tiny: of our 3 billion base pairs of DNA, only about 7 million are the detectable remnants of exapted transposons. The vast majority of transposon and transposon fragments in the human genome (as we mentioned, totaling around 1.4 billion base pairs) are not under selection and are free to mutate without affecting any function.
The genomic recycling bin
So, transposons are at once a good example of non-functional DNA in genomes (indeed, nearly half of our own genome is made up of them), and an example of how evolutionary processes can convert non-functional DNA into functional DNA through mutation and selection. While I did not discuss exapted transposons in my previous series, this is another clear example of how evolution can produce novel information within the genome: by “recycling” small amounts of its junk to produce new functions. Note well, however: the fact that a small fraction of transposons have been exapted into functional sequences does not “confer” functionality on all transposons. We see the signs of selection on only a tiny minority, and even then typically only on fragmentary remains.
In the next installment of this series, we’ll examine another form of non-functional DNA present in genomes: processed pseudogenes.
For further reading
International Human Genome Sequencing Consortium, (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. http://www.nature.com/nature/journal/v409/n6822/full/409860a0.html
Lindblad-Toh, K., et al. (2011). A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476-482. http://www.nature.com/nature/journal/v478/n7370/full/nature10530.html