There are thousands of protein coding genes in our bodies, which must be used in the correct quantities at the correct time; think of it as a symphony which must coordinate all the different instruments and tunes – how do cells orchestrate such a thing? Part of the answer is that the cell uses specific regulatory protein coding genes known as transcription factors which are dedicated to regulating other genes. One of the differences between organisms at a molecular level is how genes are regulated. It is postulated that evolution in gene regulation has been a major cause of biological diversity. In this post, I evaluate: (1) What exactly are transcription factors/promoters/regulatory genes; (2) Work done in quantifying how long regulatory regions would take to appear specifically in human DNA. I argue that work done shows that such regulatory regions to appear in human DNA would take unreasonable times on the order of billions of years – thereby ruling out evolution as an explanation for the origin of gene regulatory regions in humans.
What are transcription factors?
Transcription factors are proteins involved in the process of converting, or transcribing, DNA into RNA. Transcription factors include a wide number of proteins, excluding RNA polymerase that initiate and regulate the transcription of genes. One distinct feature of transcription factors is that they have DNA-binding domains that give them the ability to bind to specific sequences of DNA called enhancer or promoter sequences. Some transcription factors bind to a DNA promoter sequence near the transcription start site and help form the transcription initiation complex. Other transcription factors bind to regulatory sequences, such as enhancer sequences, and can either stimulate or repress transcription of the related gene. These regulatory sequences can be thousands of base pairs upstream or downstream from the gene being transcribed. Regulation of transcription is the most common form of gene control. The action of transcription factors allows for unique expression of each gene in different cell types and during development.
Transcription factors are proteins that bind to DNA (shown as green) in order to regulate the expression of a certain gene. In this image the gene is Human serum albumin gene and the regulatory sequence is CCAAT where a transcription factor protein known as the CCAAT enhancer protein –binding protein (C/EBP) binds to and regulates the expression of the gene.
Gene regulation is important because each gene that codes for a protein must be produced at a specific time in response to some specific need in the specific quantities – which must all be carefully regulated. Too much of a protein can be harmful; too little can be harmful; a protein produced at the wrong time can be harmful – hence genes must be carefully regulated.
Why does it matter?
It is postulated that changes in how genes are regulated has been a significant factor in the evolution of species, in this case particularly humans. As the authors state, “In an influential article published in 1975, Mary-Claire King and Allan Wilson argued that because the sequence and function of proteins isolated from humans and chimpanzees were so similar, something other than protein evolution per se must underlie the phenotypic differences between these two species. They posited that changes in the regulation of gene expression were responsible for more adaptive evolution than changes in the protein-coding regions of genes.” [i]
1. Durett and Schmidt: WAITING FOR REGULATORY SEQUENCES TO APPEAR[ii]
Durett and Schmidt calculate how long would it take for a new regulatory sequence to appear within a certain region of DNA, they ask: “given a 1000 nucleotide region in our genome, how long does it take for a specified six to nine letter word to appear in that region in some individual?”.
Explaining what a regulatory sequence is, they say: “A regulatory sequence is a short sequence of DNA (in vertebrates many are 6–9 nucleotides long) which is a binding site for transcription factors that promote or inhibit transcription of the DNA to make proteins.”
They obtained the following results:
Since μ = [1/10⁸], these numbers translate into huge waiting times: 3.567×10⁹ and 2.875×10¹⁰ generations, or 89.2 and 719 billion years respectively (using 25 years as the human generation time). This shows that it is important that regulatory sequences can occur in some region rather than at a fixed location.
Firstly they modelled the time it would take for a specific regulatory sequences requiring 6 and 8 nucleotides (DNA letters). They calculated a 6 and 8 letter regulatory sequence would take 89.2 and 719 billion years respectively to occur within the human population. They then conclude that regulatory sequences cannot occur in specific locations but rather in larger and more general regions.
“…for words of length 6, the average waiting time is 100,000 years, while for words of length 8, the waiting time has mean 375,000 years when there is a 7 out of 8 letter match in the population consensus sequence (an event of probability roughly 5/16) and has mean 650 million years when there is not. Fortunately, in biological reality, the match to the target word does not have to be perfect for binding to occur. If we model this by saying that a 7 out of 8 letter match is good enough, the mean reduces to about 60,000 years.”
To find a new 6 letter regulatory gene that requires one specific mutation (to have a 6 out of 6 match) given the existence of 5 correct DNA letters already would take on average 100,000 years.
To find a new 8 letter regulatory gene that requires one specific mutation (to have a 7 out of 8 match) given the existence of 6 correct DNA letters would take on average 60,000 years. If however 6 correct DNA letters existed and only 2 specific mutations were needed (to find an 8 out of 8 match) it would take 650 million years for it to first appear.
They then calculate how long it will take for the new gene to be fixated into the population by natural selection, “In reality the probability of fixation is approximately the selective advantage conferred by the mutation s and even for strongly beneficial mutations we have s ≤ 0.01. This means that the mutation would need to arise more than 100 times in order to achieve fixation, which would increase the waiting time to 6 million years.” So for example one regulatory gene requiring a single mutation would take 6 million years to become fixed in the human population.
It also means to produce an 8 out of 8 match beginning with 6 correct letters would take 650 million years for the match to first appear in some individual. However for the match to become fixed in the entire population would take 65 billion years.
2. Behrens S, Vingron M: Studying the evolution of promoter sequences: a waiting time problem.[iii]
These scientists calculate how long it would take for a regulatory gene (DNA region where a transcription factor can bind) requiring 5-10 mutations to appear (1) in a specific 1000 nucleotide region of the DNA and (2) in all 20,000 promoter regions of DNA.
“We have developed a probabilistic approach to study the evolution of regulatory regions allowing us to predict how long one has to wait for a given TF binding site of length k, k ranging from 5 to 10, to be created at random in the human species – either in one promoter of length 1 kb or in at least one of all the human promoters.”
Waiting time in a specific promoter region of DNA
They calculate the time for a 5 letter regulatory gene (requiring 5 mutations) to appear in a specific 1000 nucleotide region to be between 126 and 153 million years. And for a 10 letter gene (requiring 10 mutations) they calculate on average 72 billion years.
“For example, CCCCC is the fastest emerging 5-mer with an expected waiting time of 6,303,945 generations (=126 Myrs) to appear in a promoter of length 1 kb while AAAAA is the slowest emerging 5-mer with 7,653,814 generations (=153 Myrs). For 10-mers, the average expected waiting time is 72 billion years”
Waiting time in all 20,000 promoter regions of human DNA
They calculate that for a 5 letter long regulatory gene (requiring 5 mutations) would take 7500 years. For a 10 letter long gene it would take 4.8 million years to first appear.
“Our results indicate that new TF binding sites can indeed appear on a small evolutionary time scale: for example, given that model M1 is an appropriate choice, on average around 7,500 years may be sufficient for a given 5-mer to emerge in at least one of all the human promoters, for 8-mers around 350,000 years and for 10-mers around 4.8 Myrs (model M1). But for some TF binding sites of length 10 like, for example, the SP1 binding site, a duration of 700,000 years may be enough”
Once a gene has appeared in a specific individual it would need to be fixated in the entire population. For a 5 letter promoter gene it would then take 750,000 years; for a 10 letter promoter gene it would take 480 million years.
Figure 2: Waiting time for promoters that are 10 nucleotide sequences long as a function of promoter regions. The calculation assumes any promoter can appear at any region of DNA and still confer some selective benefit
Their calculations demonstrate that if transcription factors are specific then it would take millions of years longer than the divergence time from the chimp human common ancestor effectively demonstrating that evolution cannot account for the differences in gene regulatory regions or how they originated. They therefore conclude that any promoter sequence can occur in any region of human DNA and confer some selective advantage.
In the following post I will evaluate further whether promoter regions, transcription factors are general or specific, in other words:
- Can any transcription factor sequence be placed in any other gene promoter region and still confer a biological function and a selective advantage?
[i] David A. Garfield, Gregory A. Wray; The Evolution of Gene Regulatory Interactions. BioScience 2010; 60 (1): 15-23. doi: 10.1525/bio.2010.60.1.6
[ii] Rick Durrett, Deena Schmidt. WAITING FOR REGULATORY SEQUENCES TO APPEAR. The Annals of Applied Probability, 2007, Vol. 17, No. 1, 1–32 DOI: 10.1214/105051606000000619
[iii] Behrens, S., & Vingron, M. (2010). Studying the Evolution of Promoter Sequences: A Waiting Time Problem. Journal of Computational Biology, 17(12), 1591–1606. http://doi.org/10.1089/cmb.2010.0084