A central question that arises is how effective is the evolution process at producing novel functions. How did eyes, hearts, brains, wings, spliceosomes, ribosomes, evolve? There are plenty of stories of how all these intricate organs and complexed evolved – but these often substitute imagination with science, and provide very little in empirical evidence. Explaining how any of these complex organs evolved is quite complex however Darwinian evolution assumes that it must start with the evolution of genes.

One of the things evolution often assumes is that given enough time – evolution is inevitable. We can now ask how long it takes to actually produce a new functional gene from an existing gene. Because of the nature of DNA and proteins – this question can be answered using mathematical principles. Can random mutations, natural selection and drift produce new genes with new functions in the time available? The answer based on population genetics and probability calculations is a resounding no – as I will illustrate by reviewing work done in this area.

Gene Duplication

One of the primary and principle means through which new genes with novel functions occurs is through gene duplication followed by mutations which change the duplicated gene into one with a novel function and fitness benefit which can then be selected for. Levitt et al in an article reviewing protein folds summarize the importance of gene duplication for the evolution of novel proteins:

“The most common process in protein evolution is duplication followed by divergence (18, 73). The advantage and beauty of this process are that it removes the functional pressure from the protein domain, as the original copy maintains the original function while the new divergent copy is free to explore alternative functions. Duplication is common in all types of species, and using the SUPERFAMILY database it was estimated that the proportion of duplicated domains in animal, fungi, and bacteria genomes is at least 93%, 85%, and 50%, respectively.”[i]

gene-duplication

arrival-time-for-novel-gene-function

Each stage of the process that leads to a gene with a novel function requires a certain amount of time.

1. Behe and Snoke: Simulating evolution by gene duplication of protein features that require multiple amino acid residues[ii]

Behe and Snoke explain the basis for their model: “…some protein features, such as disulfide bonds or ligand binding sites, require the participation of two or more amino acid residues, which could require several mutations. Here we model the evolution of such protein features by what we consider to be the conceptually simplest route—point mutation in duplicated genes.”

They explain how a novel protein requiring a disulfide bond would need two or more coordinated mutations and the model they look at is if the intermediate mutations are harmful and not neutral. Their model also assumes that gene duplication has already occurred and become fixed in the population. They only model the arrival time for novel mutations in the duplicated gene and the fixation time in the population. A brief analogy illustrates their model:

illustration-of-deleterious-intermediates

 To produce a new functional word from “induction” requires two coordinated changes were the intermediate is a non-functional/ meaningless word. Gene duplication allows the duplicated gene to mutate into a non-functional gene without natural selection eliminating it. However the more coordinated mutations required the longer the time or population sizes to find the new protein. How much time exactly? And what happens to the time if the coordinated changes required increases to 3, 4, 5 etc.?

Stephen Meyer quotes evolutionary biologist, Li: “Acquiring a new function may require many mutational steps, and a point that needs emphasis is that the early steps might have been selectively neutral [non-advantageous] because the new function might not be manifested until a certain number of steps had already occurred.”

Behe explains in his paper;

“In order to produce a novel disulfide bond, a duplicated gene coding for a protein lacking unmatched cysteines would require at least two mutations in separate codons, and perhaps as many as six mutations, depending on the starting codons. We call protein characteristics such as disulfide bonds which require the participation of two or more amino acid residues “multiresidue” (MR) features”

behe-and-snoke-results

Figure 1: Results from Behe and Snoke’s simulation

Their results indicate that large population sizes and generations are required for new protein functions that require multiple coordinated mutations. If a new protein required two mutations it would take 10⁶ generations for it to be fixated in organisms with 10¹² population sizes. For effective population sizes such as bacteria with 10⁹ individuals per generation it would take roughly 10⁸ generations which is approximately 100,000 years (assuming there are 1000 bacterial generation per year). However once 4 mutations are needed even for bacterial populations which are the largest then the fixation time climbs to over 10¹² generations (1 billion years).

For humans with an effective population size of 10⁴ it would require generations in excess of 10¹². At 20 years per generation that equates to 20,000 billion years for a new protein requiring two mutations. Humans are supposed to have diverged from their common ancestor with apes about 6 million years ago.

Their model shows that a new gene with novel functions requiring two coordinated mutations with the intermediate step being non-functional cannot occur in small population sizes such as those for humans and animals.

2. Michael Lynch: Simple evolutionary pathways to complex proteins[iii]

Lynch conducted similar calculations in an effort to refute Behe and Snoke’s results, they say: “In a recent paper in this journal, Behe and Snoke (2004) questioned whether the evolution of protein functions dependent on multiple amino acid residues can be explained in terms of Darwinian processes… The following is a formal evaluation of their assertion that point-mutation processes are incapable of promoting the evolution of complex adaptations associated with protein sequences”.

They state the aim of their research is to answer the following: “The two central issues to be resolved are then: (1) How frequently will a duplication event lead to neofunctionalization; and (2) How long will this take?”

The model they use assumes the following: intermediate step before a new binding site arises is neutral and not harmful. Their model includes waiting for a gene to be duplicated and become fixed in the population; and the arrival time of novel mutations with a fitness benefit.

lynch-results
Figure 2 – Results from Lynch calculation

Lynch calculated significantly smaller waiting times for the fixation of a new binding site requiring two coordinated mutations. For a population of 10⁹ organisms, it will take 10⁶ generations if the mutation must occur within 2 specific sites.

calc-block-lynch-model

The calculation block shows how long it would take for a duplicated gene requiring two DNA mutations for a new function to become fixed in the human population – 2000 billion years. According to evolutionary theory humans diverged from their common ancestor with chimpanzees 6 million years.

An example of a duplicated gene within human population documented by Zhang:

“One good example is the red- and green-sensitive opsin genes of humans, which were generated by gene duplication in hominoids and Old World monkeys [44]. After duplication, the two opsins have diverged in function, resulting in a 30-nm difference in the maximum absorption wavelength. This confers the sensitivity to a wide range of colors that humans and related primates have…the functional difference between the red and green opsins is largely attributable to two substitutions

In the human population they are numerous cases of postulated duplicated genes, one of them is the opsin gene which duplicated into red and green undergoing two substitutions. However the mathematical calculation by Lynch shows such an event is mathematically not possible in populations similar to humans – it would take 2000 billion years.

3. Lynch and Abegg: The Rate of Establishment of Complex Adaptations

Their model shows that the evolutionary process cannot produce a new gene requiring two coordinated mutations in small population sizes similar to animals and humans in a reasonable time even if the intermediate step is neutral and not harmful.

Lynch and Abegg investigate whether complex adaptations (novel functions or features requiring atleast two mutations) can arise in reasonable times in a wide range of population sizes. They look at a wide range of variables such as population size, selection fitness benefit, whether intermediate mutations are neutral or disadvantageous. They conclude that for small population sizes with large generation times – complex adaptions will not arise in reasonable times. However for large population sizes complex adaptions can occur.

lynch-abegg-results

Figure 3: Mean number of generations until establishment of a double mutant as a function of the effective population size, with the intermediate states having a selective disadvantage of s1.

Lynch shows that for effective population sizes similar to human populations (10⁴) the generations required for a double mutation with a neutral intermediate step (S1=0) will take roughly 10⁷ generations. This amounts to roughly 200 million years for a new feature or function (such as a novel protein fold or enzyme) that requires two specific mutations to have a beneficial effect.

4. Douglas Axe: The Limits of Complex Adaptation: An Analysis Based on a Simple Model of Structured Bacterial Populations[iv]

Axe calculated the waiting time for complex adaptions for a bacterial population which has a large effective population size. The assumption is that large population sizes will be able to find new adaptions in shorter time because of the large population sizes. Axe looked at the effect of increasing the number of mutations required before a function arose on the number of generations required.

axe-results

Figure 4: Number of mutations required for a new complex adaption and the number of generations for that particular adaption

The results indicate that as the number of mutations (d) required increases the generations required to find the adaption increases exponentially as well. For example: an adaption requiring 4 mutations (d) where the intermediate steps are neutral would require roughly 10⁵ generations. At 10³ generation per year that equals (10⁵/10³ = 10² years). It would take 100 years for a mutation requiring 4 changes to arise. However if the intermediates are maladaptive (they cause a loss in fitness) it would take roughly 10¹⁸ generations which equals 10¹⁵ years (1000 billion years) which exceeds the age of the earth (4.9 billion years old).

 “As a basis for calculation, we have assumed a bacterial population that maintained an effective size of 10⁹ individuals through 10³ generations each year for billions of years. This amounts to well over a billion trillion opportunities (in the form of individuals whose lines were not destined to expire imminently) for evolutionary experimentation. Yet what these enormous resources are expected to have accomplished, in terms of combined base changes, can be counted on the fingers.”

The limits that Axe calculated are that the most complex adaption that bacterial populations (the largest population sizes out of all the organisms) that could arise is one requiring 6 mutations if the intermediate steps are neutral. This would need 10¹³ bacterial generations equal to 10 billion years. If the intermediate steps are maladaptive then the maximum mutations for a complex adaption that bacterial populations can achieve is 2. This is a serious problem for evolution to overcome because there are numerous complex novel adaptions, systems, protein machines, enzymes that would require more than 6 mutations.

5. Durett and Schmidt: Waiting for Two Mutations: With Applications to Regulatory Sequence Evolution and the Limits of Darwinian Evolution[v]

Durett and Schmidt calculated how long it would take for a new functional regulatory sequence to arise in a human population by random mutations which first inactivated the existing binding site in a protein and then secondly activated a new binding site conferring a novel function. Their model was specifically for population sizes similar to humans and flies. However their model is also applicable to calculating the arrival time for 2 specific mutations in a specific duplicated gene which has become fixed in the population.

“…we examine the waiting time for a pair of mutations, the first of which inactivates an existing transcription factor binding site and the second of which creates a new one”

Their results indicate that mutations that require two coordinated mutations in a human population with an effective population size of 10000 would take 216 million years.

“We now show that two coordinated changes that turn off one regulatory sequence and turn on another without either mutant becoming fixed are unlikely to occur in the human population…Multiplying by 25 years per generation gives 216 million years…Multiplying by 0.75 reduces the mean waiting time to 162 million years, still a very long time. Our previous work has shown that, in humans, a new transcription factor binding site can be created by a single mutation in an average of 60,000 years, but, as our new results show, a coordinated pair of mutations that first inactivates a binding site and then creates a new one is very unlikely to occur on a reasonable timescale.”

They conclude a pre-specified coordinated pair of mutations cannot occur in the human population because it would take 162 million years. However once a specific coordinated pair of mutations arise in one particular individual in the population it would have to be fixed within the whole population which would increase the total time.

durett-calc-model

6. John Sanford et al: The waiting time problem in a model hominin population.

These researchers simulate how long it would take for new proteins requiring between 2-8 changes in DNA sequence to occur within the hominin population, they say: “In this paper we address the question, “How long does it take for the simplest biological text strings to arise and be fixed, within a hominin population?”…We use biologically realistic numerical simulations to analyze waiting times for the generation and fixation of specific strings of nucleotides of various lengths, given different mutation rates, given different selection pressures, and given different population sizes.”[vi]

Their model assumes that mutations are neutral until the target string is found. They model assumes a gene duplication event has occurred and become fixed and so they calculate the waiting time for specific mutations to occur and become fixed. The results they obtained are captured in the figure below.

sanford-et-al-results

Figure 4: Waiting time results obtained from Sanford et al. simulation. Selective advantage used is 0.1 and effective population size is 10

The researchers conclude: “We have used comprehensive numerical simulations to show that in populations of modest size (such as a hominin population), there is a serious waiting time problem that can constrain macroevolution. Our studies show that in such a population there is a significant waiting time problem even in terms of waiting for a specific point mutation to arise and be fixed (minimally, about 1.5 million years). We show that the waiting time problem becomes very severe when more than one mutation is required to establish a new function”

For a new duplicated protein requiring one single mutation it will take 1.53 million years for it to be fixed in the population. For 2 coordinated changes they calculate 84.1 million years, which is roughly half of what Durett and Schmidt calculated (175 million years) and much smaller than Lynch’s and Behe’s results. Their model is based on a much larger selective advantage than Durett’s and Lynch’s. Their results confirm how much of a significant problem two or more coordinated mutations pose to Darwinian processes for small population sizes such as humans.

summary-part-1

Why does this matter?

First let us start with an example of a duplicated gene within human population documented by Zhang:

“One good example is the red- and green-sensitive opsin genes of humans, which were generated by gene duplication in hominoids and Old World monkeys [44]. After duplication, the two opsins have diverged in function, resulting in a 30-nm difference in the maximum absorption wavelength. This confers the sensitivity to a wide range of colors that humans and related primates have…the functional difference between the red and green opsins is largely attributable to two substitutions

  1. In the human population there are numerous cases of postulated duplicated genes, one of them is the opsin gene which duplicated into red and green sensitive opsin genes undergoing two substitutions.
  2. However the various mathematical calculations by scientists such as Lynch for example shows such an event (gene duplication followed by 2 mutations) in populations similar to humans – it would take 2000 billion years. Others like Sanford et al, show it would take 84 million years depending on the selective advantage of the gene in question.
  3. Evolution assumes that humans diverged from their common ancestor with chimpanzees 6 million years ago.
  4. Gene duplication, mutations and natural selection cannot explain genetic change because it would take too much time. Evolution’s gene duplication mechanism that is supposed to primarily explain how genes evolve into new genes with new functions has been shown to be ineffective and false.
  5. Evolution begins at its most basic level with changes in genes which lead to new traits and eventually new species – however it’s primary and principle mechanism for generating new genes has been shown to be ineffective.

Bibliography


[i]
Rachel Kolodny, Leonid Pereyaslavets, Abraham O. Samson, Michael Levitt. On the Universe of Protein Folds. Annu. Rev. Biophys. 2013. 42:559–82, 10.1146/annurev-biophys-083012-130432References

[ii] Michael Behe, David W.Snoke. Simulating evolution by gene duplication of protein features that require multiple amino acid residues. Protein Science (2004), 13:2651–2664.

[iii] MICHAEL LYNCH. Simple evolutionary pathways to complex proteins. Protein Science (2005), 14:2217–2225

[iv] Axe DD. The limits of complex adaptation: An analysis based on a simple model of structured bacterial populations. BIO-Complexity 2010(4):1-10. doi:10.5048/BIO-C.2010.4

[v] Rick Durrett, Deena Schmidt. Waiting for Two Mutations: With Applications to Regulatory Sequence Evolution and the Limits of Darwinian Evolution.  GENETICS November 1, 2008 vol. 180 no. 3, 1501-1509; DOI: 10.1534/genetics.107.082610

[vi] John Sanford , Wesley Brewer, Franzine Smith and John Baumgardner. The waiting time problem in a model hominin population. Theoretical Biology and Medical Modelling (2015) 12:18