Entropy drives the pace of molecular evolution

How do epistasis, convergence and adaptation interact to determine rates of molecular evolution? It turns out that the number of sequences that will fold correctly plays a big role, while population size does not. We describe a mathematical theory for why this is so.

Like Comment

The paper in Nature Ecology & Evolution is here: http://go.nature.com/2zBRZr7

Although this paper took years to complete, it has been some of the most exciting work that we've done because it required integration of past work and conceptual thinking on many different levels. Epistasis, coevolution, convergence, adaptation, sequence evolution, protein structure, entropy, statistical mechanics, mathematical theory, simulations, and interpretation of comparative data; it's all there. We also had to think harder than usual about the philosophy of what we were doing. For example, when we started talking about this work as a mechanism of molecular evolution, we had to consider, “What do we mean by 'mechanism'?” As a result, we learned that calling other models ‘phenomenological’ in contrast to ‘mechanistic’ was considered ‘fighting words’ to some. (We ended up using the word 'empirical' in the article to avoid offense).

The mathematical foundations of the work have been another big challenge when explaining it to people. People with strong physics backgrounds tend to understand the meaning of the math more quickly, more intuitively, because most of it is taken directly from statistical mechanics, and analogies to physical phenomena such as the Stokes shift call up memories from student days. On the other hand, the idea that the salient features of the system are not caused directly by the physics, but rather by the biological need for function, tends to be more intuitive to biologists.

Other challenging aspects in talking about this work arise because so much of it involves counter-intuitive concepts, or at least ideas that are counter to our long-established habits of thinking about evolution. We tend to think of small parts of things as ‘fitting in’ to the bigger picture, the cogs that play their part in making the machine run. The counterintuitive thing we are saying is that evolution is often the opposite of this: if a small part initially only partly fits into a much bigger machine, the machine will evolve to make the best use of the part it can. In this case, amino acids are the parts, and the rest of the protein is the machine. A good visual analogy for this is the idea of a hand pressed in memory foam; the surface of the memory foam adjusts to the hand and relaxes around it, eventually making the hand the perfect shape to fit into the impression that it made.

We also tend to think that the compatibility of amino acids for a position in a protein is fixed, and maybe even usefully categorized as a type based on observed substitution averages. It turns out there is no reason to think that this is generally true. Instead, the contribution that a resident amino acid makes to the stability of a protein wanders, as do the effects on stability if you replace it with another amino acid. At first this seems bad because it has the potential to add a great deal of complexity to an already difficult problem; the end result, though, is that the problem is simplified. A big part of the simplification occurs because the driving force of this process is a feature common to the entire protein, the sequence entropy of folding. This is related to how many fewer sequences will fold to the ‘target’ structure, the one that is functional, at higher versus lower free energies of folding.

Finally, perhaps the most counterintuitive aspect of the theory is that important population genetics aspects such as population size and strength of selection are not important in determining rates of nearly neutral molecular evolution. Signs of this have been observed before, but now we can say exactly why these terms go away. This caused trouble from more than one reviewer who couldn’t see how this could be so, but it is. Qualitatively, it happens because stably folded proteins are at such an extreme tail of the sequence space distribution that the falloff in sequence numbers with increasing stability is fairly constant. No matter the population size, the system finds the stability value that matches the change in entropy to the change in fitness to produce a nearly neutral evolutionary process that has a similar relationship between fitness effects and changes in stability. Other physical features that you might expect to affect the process, such as protein designability, have their effects funneled through the single entropy change parameter.

If nothing else, we hope this paper will change the dialog and get people thinking in new ways about the mechanism behind the relative fitnesses of mutations, and how it might affect phylogenetics, predicting mutation effects, and ancestral inference. There’s a lot of work still to do on this, and we look forward to seeing where the science goes in the future.

David Pollock

Professor, University of Colorado School of Medicine