In 1968 Motoo Kimura published a short article in Nature in which he argued that “most mutations produced by nucleotide replacement are almost neutral in natural selection”. This fantastic paper is generally viewed as having established the “neutral theory” of molecular evolution, whose central principle was set out by Jack King and Lester Jukes in a Science paper the following year:
Evolutionary change at the morphological, functional, and behavioral levels results from the process of natural selection, operating through adaptive change in DNA. It does not necessarily follow that all, or most, evolutionary change in DNA is due to the action of Darwinian natural selection.
It is hard to overstate the importance of these papers. They offered an immediate challenge to the deeply flawed, but widely held, belief that all changes to DNA must be adaptive – an assumption that was poisoning the way that most biologists were reckoning with the first wave of protein sequence data. And, as their ideas were rapidly accepted in the nascent field of molecular evolution, the neutral theory loomed over virtually all analyses of sequence variation within and between species for decades to come.
What Kimura, King and Jukes really did was to establish a new “null model” against which any putative example of adaptive molecular change must be judged. Indeed, neutrality offered such a good explanation for sequence changes over time that when I entered the field in the early 90′s researchers will still struggling to find a single example of molecular change for which a neutral explanation could be rejected.
While the explosion of sequence data in the past decade ultimately yielded unambiguous evidence for large-scale adaptive molecular evolution, it is hard to overstate just how powerful the neutral null model was in forcing people to think clearly about what adaptive change means, and how one would go about identifying clear examples of it.
I think a lot about Kimura, the neutral theory, and the salutary effects of clear null models every time I get involved in discussions about the function, or lack thereof, of biochemical events observed in genomics experiments, such as those triggered this week by publications from the ENCODE project.
It is easy to see the parallels between the way people talk about transcribed RNAs, protein-DNA interactions, DNase hypersensitive regions and what not, and the way people talked about sequence changes PK (pre Kimura). While many of the people carrying out RNA-seq, ChIP-seq, CLIP-seq, etc… have been indoctrinated with Kimura at some point in their careers, most seem unable to apply his lesson to their own work. The result is a field suffused with implicit or explicit thinking along the following lines:
I observed A bind to B. A would only have evolved to bind to B if it were doing something useful. Therefore the binding of A to B is “functional”.
One can understand the temptation to think this way. In the textbook view of molecular biology, everything is highly regulated. Genes are transcribed with a purpose. Transcription factors bind to DNA when they are regulating something. Kinases phosphorylate targets to alter their activity or sub-cellular location. And so on. Although there have always been lots of reasons to dismiss this way of thinking, until about a decade ago, this is what the scientific literature looked like. In the day where papers described single genes and single interactions, who would bother to publish a paper about a non-functional interaction they observed?
But experimental genomics blew this world of Mayberry molecular biology wide open. For example, when Mark Biggin and I started to do ChIP-chip experiments in Drosophila embryos, we found that factors were binding not just to their dozen or so non-targets, but the thousands, and in some cases tens of thousands of places across the genome. Having studied my Kimura, I just assumed that the vast majority of these interactions had evolved by chance – a natural, essential, consequence of the neutral fixation of nucleotide changes that happened to create transcription factor binding sites. And so I was shocked that almost everyone I talked to about this data assumed that every one of these binding events was doing something – we just hadn’t figured out what yet.
But if you think about this, you will realize that this simply can not be true. As we and many others have now shown, molecular interactions are not rare. Transcripts, transcription factor binding sites, DNA modifications, chromatin modifications, RNA binding sites, phosphorylation sites, protein-protein interactions, etc… are everywhere. This suggests that these kind of biochemical events are easy to create – change a nucleotide here – wham, a new transcription factor binds, an splicing site is lost, a new promoter is created, a glycosylation site is eliminated.
Does this conflict with the neutral theory? Not at all! Indeed, it is perfectly consistent with it. The neutral theory does not demand that most sequence changes have no measurable effect on the organisms. Rather the only thing you have to assume is that the vast majority of the biochemical events that happen as a consequence of random mutations do not significantly affect organismal fitness. Given that such a large fraction of the genome is biochemically active, the same basic logic Kimura, King and Jukes used to argue for neutrality – that it is simply impossible for such a large number of molecular traits to have been driven to fixation by selection – argues strongly that most biochemical events do not contribute significantly to fitness. Indeed, given the apparent frequency with which new molecular interactions arise, it is all but impossible that we would still exist if every new molecular event had a strong phenotypic effect.
This, of course, does not mean that all these molecular events do nothing – their very existence is a form of function. But we are generally interested in different types of function – things that did arise through natural selection, are maintained by purifying selection, and whose disruption will cause a disease or other significant phenotype. Of course these things exist amidst the rubble. The question is how to find them. And here, I think we should once again take our cue from Kimura.
As I argued above, the field of molecular evolution developed a powerful intellectual core in no small part because researchers had to reckon with the powerful neutral null hypothesis – meaning that adaptive change had to be demonstrated, not assumed. We need to apply the same logic to molecular interactions.
Rather than assuming – as so many of the ENCODE researchers apparently do – that the millions (or is it billions?) of molecular events they observe are a treasure trove of functional elements waiting to be understood, they should approach each and every one of them with Kimurian skepticism. We should never accept the existence or a molecule or the observation that it interacts with something as prima facia evidence that it is important. Rather we should assume that all such interactions are non-functional until proven otherwise, and develop better, compelling, ways to reject this null hypothesis.
To paraphrase King and Jukes:
Life is dependent on the production of and interaction between DNAs, RNAs, proteins and other biomolecules. It does not necessarily follow that all, or most, biomolecules and interactions among them are due to the action of Darwinian natural selection.
I want to end by pointing out that there are lots of people (me and my group included) who have already been wrestling with this issue, with lots of interesting ideas and results already out there. From an intellectual standpoint I’d like to particularly point out the influence the writings of Mike Lynch have had on me – see especially this.
NOTE: There’s a lot more to say about this, and in the interests of time (I have to give a genetics lecture first thing in the morning) I haven’t gone into as much depth as some of these issues deserve. I will update this post as time permits.