It is oddly fitting that the papers describing the results of the NIH’s massive $200m ENCODE project were published in the midst of political convention season. For this was no typical scientific publication, but a carefully orchestrated spectacle, meant to justify a massive, expensive undertaking, and to convince us that we are better off now than we were five years ago.<\/p>\n
I’ll touch more on details of the science, and the way it was carried out, in another, longer, post. But I want to try to explain to people who were asking on twitter why I found today’s media blitz to promote the ENCODE publications so off-putting.\u00a0Because, as cynical as I am about this kind of thing, I still found myself incredibly disheartened by the\u00a0degree to which the ENCODE press release and many of the interviews published today push a narrative about their results that is, at best, misleading.<\/p>\n
The issues all stem, ultimately, from the press releases issued by the ENCODE team,\u00a0one of which<\/a>\u00a0begins:<\/p>\n The hundreds of researchers working on the ENCODE project have revealed that much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes. Without these switches, genes would not work \u2013 and mutations in these regions might lead to human disease. The new information delivered by ENCODE is so comprehensive and complex that it has given rise to a new publishing model in which electronic documents and datasets are interconnected.<\/p><\/blockquote>\n The problems start before the first line ends. As the authors undoubtedly know, nobody actually thinks that non-coding DNA is ‘junk’ any more. It’s an idea that pretty much only appears in the popular press, and then only when someone announces that they have debunked it. Which is fairly often. And has been for at least the past decade. So it is more than just intellectually lazy to start the story of ENCODE this way. It is dishonest – nobody can credibly claim this to be a finding of ENCODE. Indeed it was a clear sense of the importance of non-coding DNA that led to the ENCODE project in the first place. And yet, each of the dozens of news stories I read on this topic parroted this absurd talking point – falsely crediting ENCODE with overturning an idea that didn’t need to be overturned.<\/p>\n But the deeper problem with the PR, and the main paper to some extent, is the way that they slip and slide around the extent and nature of the functions they have “discovered”. The pullquote from the press release is that\u00a0the human genome is a “massive control panel with millions of switches regulating the activity of our genes”. So let’s untangle this a bit.\u00a0It is true that the paper describes millions of sequences bound by transcription factors or prone to digestion by DNase. And it is true that many bona fide regulatory sequences will have these properties. But as even the authors admit, only some fraction of these sequence will actually turn out to be involved in gene regulation. So it is simply false to claim that the papers have identified millions of switches.<\/p>\n Ewan Birney, who lead the data analysis for the entire ENCODE project, wrote an\u00a0excellent, measured\u00a0post on the topic<\/a>\u00a0today in which he makes it clear that when they claim that 80% of the genome is “functional”, the are simply refers to its having biochemical activity. And yet even\u00a0his quotes in the press release play a bit fast and loose with this issue, repeating the millions of switches line. Surely it’s a sign of a toxic process when people let themselves be quoted saying something they don’t really believe.<\/p>\n