A neutral theory of molecular function

In 1968 Motoo Kimura published a short article in Nature in which he argued that “most mutations produced by nucleotide replacement are almost neutral in natural selection”. This fantastic paper is generally viewed as having established the “neutral theory” of molecular evolution, whose central principle was set out by Jack King and Lester Jukes in a Science paper the following year:

Evolutionary change at the morphological, functional, and behavioral levels results from the process of natural selection, operating through adaptive change in DNA. It does not necessarily follow that all, or most, evolutionary change in DNA is due to the action of Darwinian natural selection.

It is hard to overstate the importance of these papers. They offered an immediate challenge to the deeply flawed, but widely held, belief that all changes to DNA must be adaptive – an assumption that was poisoning the way that most biologists were reckoning with the first wave of protein sequence data. And, as their ideas were rapidly accepted in the nascent field of molecular evolution, the neutral theory loomed over virtually all analyses of sequence variation within and between species for decades to come.

What Kimura, King and Jukes really did was to establish a new “null model” against which any putative example of adaptive molecular change must be judged. Indeed, neutrality offered such a good explanation for sequence changes over time that when I entered the field in the early 90’s researchers will still struggling to find a single example of molecular change for which a neutral explanation could be rejected.

While the explosion of sequence data in the past decade ultimately yielded unambiguous evidence for large-scale adaptive molecular evolution, it is hard to overstate just how powerful the neutral null model was in forcing people to think clearly about what adaptive change means, and how one would go about identifying clear examples of it.

I think a lot about Kimura, the neutral theory, and the salutary effects of clear null models every time I get involved in discussions about the function, or lack thereof, of biochemical events observed in genomics experiments, such as those triggered this week by publications from the ENCODE project.

It is easy to see the parallels between the way people talk about transcribed RNAs, protein-DNA interactions, DNase hypersensitive regions and what not, and the way people talked about sequence changes PK (pre Kimura). While many of the people carrying out RNA-seq, ChIP-seq, CLIP-seq, etc… have been indoctrinated with Kimura at some point in their careers, most seem unable to apply his lesson to their own work. The result is a field suffused with implicit or explicit thinking along the following lines:

I observed A bind to B. A would only have evolved to bind to B if it were doing something useful. Therefore the binding of A to B is “functional”.

One can understand the temptation to think this way. In the textbook view of molecular biology, everything is highly regulated. Genes are transcribed with a purpose. Transcription factors bind to DNA when they are regulating something. Kinases phosphorylate targets to alter their activity or sub-cellular location. And so on. Although there have always been lots of reasons to dismiss this way of thinking, until about a decade ago, this is what the scientific literature looked like. In the day where papers described single genes and single interactions, who would bother to publish a paper about a non-functional interaction they observed?

But experimental genomics blew this world of Mayberry molecular biology wide open. For example, when Mark Biggin and I started to do ChIP-chip experiments in Drosophila embryos, we found that factors were binding not just to their dozen or so non-targets, but the thousands, and in some cases tens of thousands of places across the genome. Having studied my Kimura, I just assumed that the vast majority of these interactions had evolved by chance – a natural, essential, consequence of the neutral fixation of nucleotide changes that happened to create transcription factor binding sites. And so I was shocked that almost everyone I talked to about this data assumed that every one of these binding events was doing something – we just hadn’t figured out what yet.

But if you think about this, you will realize that this simply can not be true. As we and many others have now shown, molecular interactions are not rare. Transcripts, transcription factor binding sites, DNA modifications, chromatin modifications, RNA binding sites, phosphorylation sites, protein-protein interactions, etc… are everywhere. This suggests that these kind of biochemical events are easy to create – change a nucleotide here – wham, a new transcription factor binds, an splicing site is lost, a new promoter is created, a glycosylation site is eliminated.

Does this conflict with the neutral theory? Not at all! Indeed, it is perfectly consistent with it. The neutral theory does not demand that most sequence changes have no measurable effect on the organisms. Rather the only thing you have to assume is that the vast majority of the biochemical events that happen as a consequence of random mutations do not significantly affect organismal fitness. Given that such a large fraction of the genome is biochemically active, the same basic logic Kimura, King and Jukes used to argue for neutrality – that it is simply impossible for such a large number of molecular traits to have been driven to fixation by selection – argues strongly that most biochemical events do not contribute significantly to fitness. Indeed, given the apparent frequency with which new molecular interactions arise, it is all but impossible that we would still exist if every new molecular event had a strong phenotypic effect.

This, of course, does not mean that all these molecular events do nothing – their very existence is a form of function. But we are generally interested in different types of function – things that did arise through natural selection, are maintained by purifying selection, and whose disruption will cause a disease or other significant phenotype. Of course these things exist amidst the rubble. The question is how to find them. And here, I think we should once again take our cue from Kimura.

As I argued above, the field of molecular evolution developed a powerful intellectual core in no small part because researchers had to reckon with the powerful neutral null hypothesis – meaning that adaptive change had to be demonstrated, not assumed. We need to apply the same logic to molecular interactions.

Rather than assuming – as so many of the ENCODE researchers apparently do – that the millions (or is it billions?) of molecular events they observe are a treasure trove of functional elements waiting to be understood, they should approach each and every one of them with Kimurian skepticism. We should never accept the existence or a molecule or the observation that it interacts with something as prima facia evidence that it is important. Rather we should assume that all such interactions are non-functional until proven otherwise, and develop better, compelling, ways to reject this null hypothesis.

To paraphrase King and Jukes:

Life is dependent on the production of and interaction between DNAs, RNAs, proteins and other biomolecules. It does not necessarily follow that all, or most, biomolecules and interactions among them are due to the action of Darwinian natural selection.

I want to end by pointing out that there are lots of people (me and my group included) who have already been wrestling with this issue, with lots of interesting ideas and results already out there. From an intellectual standpoint I’d like to particularly point out the influence the writings of Mike Lynch have had on me – see especially this.


NOTE: There’s a lot more to say about this, and in the interests of time (I have to give a genetics lecture first thing in the morning) I haven’t gone into as much depth as some of these issues deserve. I will update this post as time permits.


This entry was posted in ENCODE, evolution, gene regulation, genetics, NOT junk, science. Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted September 7, 2012 at 1:51 am | Permalink

    This reminds me of early phosphoproteomics hype – finding all the “switches” in signaling was the promise, yet so many phosphorylation sites were found it made no sense to assert each one was functional. In this paper with Mike Yaffe, we argue that there isn’t enough specificity in kinase motifs – phosphorylation sites appear frequently and without a paired phosphoprotein binding domain, they are pretty useless as switches. Yes one should assume small transient binding motifs are neutral until proven otherwise.


  2. Nicolas Le Novere
    Posted September 7, 2012 at 7:29 am | Permalink

    Great blog post, thank-you. Back in my MSc of evolution, Kimura was my hero.
    I worked many years in a hard core molecular biology lab, that spent a lot of energy, time and money trying to explain everything. The most dispiriting observation for many of my colleagues was the expression of the nicotinic receptor subunit beta2. It is expressed in every cell that does not express a certain TF shuting it off. In particular, it is expressed in every neuron. In hundreds of billions of neurons, it is expressed alone, without subunit able to bind acetylcholine. It is expressed and degraded, having spent a short useless “life”. More than that, it is expressed at very high level. Even in the neurons where it serves a certain “function”, it is “rescued” during folding by the complementary subunit. But up to 90% of the beta2 polypeptide is degraded directly in the ER, without forming any receptor.

    And what about the wasted effort spent endlessly studying minor differences between species in micromolar Kd while the concentrations of ligand are millimolar. I wrote a bit about that http://www.ebi.ac.uk/~lenov/PUBLIS/Lenov2002b.pdf

  3. NickMatzke
    Posted September 7, 2012 at 7:41 am | Permalink

    You mean Tom Jukes. Who heavily influenced me even from the grave: http://ncse.com/rncse/26/1-2/design-trial

  4. Meredith Carpenter
    Posted September 7, 2012 at 8:13 am | Permalink

    This also makes me wonder about the 20% for which ENCODE found no function – isn’t it suspicious that nothing was going on there at all?

  5. Posted September 7, 2012 at 8:30 am | Permalink

    Nice post. It seems to jump a little between the neutral theory of molecular evolution as a guiding metaphor and its usefulness for understanding function. As such, it is probably worth stating that even if a genetic change was driven by selection that does not mean that the “function” we associate with it is meaningful or the trait that directional selection acted on. This is the more profound challenge that is still faced by evolutionary biology [and always has been] even as we can move beyond the neutral theory.

  6. Posted September 7, 2012 at 8:33 am | Permalink

    Very well put. I recently heard a by Andreas Wagner or someone who argued that there was very little evidence for neutrality. I hope I remember that it was he (the meeting was in Italy in about 2010). Anyway, even this was not about strong Darwinism.

    However, Ohta and others in the Kimura school (and Masatoshi Nei, here at Penn State) have basically backed away from strict neutrality to argue for ‘nearly neutral’ as a safer model. Since _exact_ neutrality is almost a metaphysical concept, all we can say is that drift predominates. So, even if some nucleotide has some function now and then (almost impossible to prove even experimentally!) that the majority don’t meaningfully have ‘adaptive’ value.

    Further, I think, is that even slightly fitness-related variants or genome characteristics might be adaptive (or, at least, not be removed by purifying selection, which is a much less stringent criterion that specific adaptations) but that function over very long time periods. Again, in the ENCODE scale of things, even such ‘function’ (such as you described in your post) or its variation are meaningless for humans per se.

  7. Posted September 7, 2012 at 9:54 am | Permalink

    Great blog post, thanks Michael! I was reminded of critical phase math when reading your post. e.g. we define temperature under an assumption that all molecular movements are essentially random. we define wind under a different assumption (directional correlation of velocities). It is true that we have in reality a wind-speed-status as well as a temperature and so each molecular interaction in an organism can potentially have both interpretations simultaneously in effect and predictive. So I’m hoping to see a generalization of the earlier papers shows that e.g. immune system function works at a critical phase-change boundary in statistical terms so that we can include the whole body.

  8. Georgi Marinov
    Posted September 7, 2012 at 11:32 am | Permalink

    The result is a field suffused with implicit or explicit thinking along the following lines:

    I observed A bind to B. A would only have evolved to bind to B if it were doing something useful. Therefore the binding of A to B is “functional”.

    That’s not necessarily true, I think people are generally aware. It’s just that explicitly discussing this does not exactly fit the way papers are written – yes, a lot of binding events are probably of little consequence, however for any individual event you have no way of knowing whether it is functional or it isn’t without orthogonal data, and the orthogonal evidence is often not at all easy to get as it involves some time-consuming and/or technically difficult experiments.

  9. Posted September 7, 2012 at 11:58 am | Permalink

    Thank-you, thank-you, thank-you.

    Well said.

    Now if we could only convince the 99% of scientists who don’t have a clue what you’re talking about. 🙂

    BTW, I used the lizard picture again on my blog. Hope you don’t mind.

  10. Allen Rodrigo
    Posted September 7, 2012 at 2:10 pm | Permalink

    A great post — now to apply the null model of stochastic interactions in hypothesis tests of well designed experiments.

  11. Dave
    Posted September 7, 2012 at 2:20 pm | Permalink

    Great post with plenty of food-for-thought. We have been trying to figure out what proportion of ChIP-Seq peaks are actually “functional” by using a combination of silencing, ChIPing and RNA-Seqing etc and it is indeed a very big challenge. I don’t agree that those of us doing the work associated a binding event to a function because anyone who has looked at ChIP data will immediately reject this hypothesis. From our experiments in which we have tried to associate binding with transcription, we are starting to see that <10% of the peaks appear to be "functional". They could always have other functions besides influencing expression, but I'm not sure we know what they are right now.

  12. Posted September 7, 2012 at 3:18 pm | Permalink

    I got my Ph.D. at what Chuck Langley once jokingly called “U. C. Davis, the gaga pan-selectionist center of the universe.” However, what you’ve said is pretty much what I thought as soon as I started seeing that 80% figure being bandied about. Mike Lynch’s arguments are indeed relevant and, to me, persuasive.

  13. Posted September 7, 2012 at 6:48 pm | Permalink

    Very well said. Blind chance is the null hypothesis in biology as in everything else, and effects are side effects until we have evidence to the contrary.

  14. Konrad Scheffler
    Posted September 7, 2012 at 10:51 pm | Permalink

    Great post. May I alert people here that the ENCODE papers have led to the noncoding DNA page on Wikipedia being flagged as in need of updating, with one misleading edit having been added to the preamble already. It would be great if people who understand this issue could help keep the entry sane.


  15. Feng Liu
    Posted September 7, 2012 at 11:51 pm | Permalink

    Second the suggestion that “adaptive change had to be demonstrated, not assumed”. As Francois Jacob argued before, evolution works like a tinkerer, who makes new things out of whatever is available. A lot of the sequences/mutations in animal genomes may arise in the first place by chance, but later they may happen to be incorporated to be part of a functional element, like enhancers or genes. But only the latter event (functionalization of the sequence) is directly subject to selection, the importance of the first event (sequence mutation) is not obvious a priori. So what role a particular DNA sequence does for the genome/organism has to be experimentally tested before any importance is assigned to it.

    As for the many seemingly unwanted transcription factor bindings across the genome, it is tempting to take them too seriously (like to say they have to be involved in gene regulation), but on the other hand, it may be dangerous to dismiss them too easily. I think what might turn out from the studies of enhancers is that new enhancers or new enhancer functions can evolve more easily than we think right now. Like in the tinkering analogy; something has to be there before the tinker can start playing, because nothing can be made out of nothing. So the promiscuous TF bindings may be meaningless for now, but they may be required to provide the background for later DNA sequences changes nearby to generate new functions. In this case, a retrospective function may be assigned to the promiscuous TF bindings. But again, the function of DNA sequences have to be demonstrated at a given time point, but assumed.

  16. Jim Woodgett
    Posted September 8, 2012 at 8:27 am | Permalink

    A beautifully erudite and exceptional exposition of not only neutral theory but the scientists dilemma. Just because we observe something does not mean it has a specific meaning. This is a central (and understandable) conceit in that when we record something different/unexpected, our logic is to assign it some level of significance. Yet we know that biological research is an imperfect science that itself scrutinizes an imperfect subject. Surely, only protagonists of Intelligent Design could assume that biology itself is perfect, is error-free and steps from one stone to the next in a purposeful and preconceived way? Indeed, the incredible robustness of biological processes (development! development! development! to paraphrase Steve Ballmer: http://www.youtube.com/watch?v=8To-6VIJZRE ) in the face of imperfect replication of DNA and unpredictable/uncontrollable micro-environmental changes exemplifies the vacuity of thinking that every minutiae has consequence.

  17. Posted September 8, 2012 at 8:20 pm | Permalink

    I’m not a biologist but it seems to me that this redundancy is most likely essential to evolution: half-built machinery that doesn’t do anything, or if it does do anything only does it by chance as an essentially random side-effect, is machinery that might one day find itself doing something useful. The more of it there is, the more chance there is of useful combinations appearing here and there. The junk isn’t really junk: it’s diversity.

    In which case, you’d expect apparent disorder to be a positive feature: an organism which can produce a wider variety of sequences in its DNA might well be able to evolve faster.

    Has anyone attempted to study whether there’s a relationship between the proportion of “junk” DNA an organism has and its speed of evolution?

  18. Posted September 8, 2012 at 8:24 pm | Permalink

    Urgh, apologies for sloppy language—by “organism” I mean a species, not an individual . . . Like I said, I’m not a biologist.

  19. Posted September 9, 2012 at 9:48 am | Permalink

    This is the most sensible post I’ve seen so far on the ENCODE results. I’d like to add one small point. Even for loci or sites where we reject the neutral null hypothesis, the neutral stochastic processes are still running in the background at that site. When we “reject” neutrality at a given site, we are really just adding a term or factor to the neutral prediction. A neutral model provides the backdrop for everything that happens, including selection.

  20. shobs
    Posted September 11, 2012 at 9:03 am | Permalink

    the problem is that many scientists believe that there has to be sense in everything that is observable today or purpose is the sole reason why things evolve, which is a major flaw as it hugely biases our conclusions and diminishes our capacity of objective judgement. therefore it is so difficult for people to accept neutral evolution. One more thing, since the idea of evolution is itself a determinant one/non-neutral one i.e. it says that things have changed so as to contribute to the phenomenon observed today, it is hard to think about it neutrally.

  21. shobs
    Posted September 11, 2012 at 9:05 am | Permalink

    @ Tim J
    some papers by michael lynch on genome evolution should answer that… they are easy and good read for even non-biologists…

6 Trackbacks

  • By ENCODE Coverage Round Up: Press, Blogs, and Tweets on September 7, 2012 at 7:06 am

    […] Michael Eisen’s Neutral Evolution 101 for ENCODE: A neutral theory of molecular function […]

  • By Eisen also gets it: more on ENCODE « The MolBio Hut on September 7, 2012 at 7:46 am

    […] Eisen has discussed some of these issues on his blog, and this is a brief extract from one of his posts: But if you think about this, you will realize that this simply can not be true. As we and many […]

  • […] creeping into discussions in the context of ENCODE, Michael Eisen proposes that we develop a “A neutral theory of molecular function” to interpret the meaning of these reproducible biochemical events that have no known […]

  • […] via ENCODE naar buiten is gebracht, lijkt erg hoog te zijn, zie bijvoorbeeld dit kritische stuk. En deze blog legt uit dat activiteit van het DNA of binding van RNA’s die er vanaf komen, niet per se betekent […]

  • By בוטוקס הצעות מחיר on September 20, 2012 at 10:32 am

    בוטוקס הצעות מחיר…

    … בוטוקס השוואת מחירים – במקרה כזה ניתוח אף וכמות ניתוחי האף היא הגבוהה בעולם ביחס לגודל האוכלוסייה. הם חושבים כי אף חדש יביא עימו עלייה בביטחון העצמי, ולמצב רוח ירוד. מדובר בטיפול שהינו רפואי בעיקרו וה… A neutral theory of molecular function ……

  • By ENCODE: Data, Junk and Hype « Education Genetics on October 10, 2012 at 9:58 am

    […] A Neutral Theory of Molecular Function This blog post by Michael Eisen “wrestles” with the idea of junk DNA. I want to end by pointing out that there are lots of people (me and my group included) who have already been wrestling with this issue, with lots of interesting ideas and results already out there. From an intellectual standpoint I’d like to particularly point out the influence the writings of Mike Lynch have had on me – see especially this. […]