This 100,000 word post on the ENCODE media bonanza will cure cancer

It is oddly fitting that the papers describing the results of the NIH’s massive $200m ENCODE project were published in the midst of political convention season. For this was no typical scientific publication, but a carefully orchestrated spectacle, meant to justify a massive, expensive undertaking, and to convince us that we are better off now than we were five years ago.

I’ll touch more on details of the science, and the way it was carried out, in another, longer, post. But I want to try to explain to people who were asking on twitter why I found today’s media blitz to promote the ENCODE publications so off-putting. Because, as cynical as I am about this kind of thing, I still found myself incredibly disheartened by the degree to which the ENCODE press release and many of the interviews published today push a narrative about their results that is, at best, misleading.

The issues all stem, ultimately, from the press releases issued by the ENCODE team, one of which begins:

The hundreds of researchers working on the ENCODE project have revealed that much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes. Without these switches, genes would not work – and mutations in these regions might lead to human disease. The new information delivered by ENCODE is so comprehensive and complex that it has given rise to a new publishing model in which electronic documents and datasets are interconnected.

The problems start before the first line ends. As the authors undoubtedly know, nobody actually thinks that non-coding DNA is ‘junk’ any more. It’s an idea that pretty much only appears in the popular press, and then only when someone announces that they have debunked it. Which is fairly often. And has been for at least the past decade. So it is more than just intellectually lazy to start the story of ENCODE this way. It is dishonest – nobody can credibly claim this to be a finding of ENCODE. Indeed it was a clear sense of the importance of non-coding DNA that led to the ENCODE project in the first place. And yet, each of the dozens of news stories I read on this topic parroted this absurd talking point – falsely crediting ENCODE with overturning an idea that didn’t need to be overturned.

But the deeper problem with the PR, and the main paper to some extent, is the way that they slip and slide around the extent and nature of the functions they have “discovered”. The pullquote from the press release is that the human genome is a “massive control panel with millions of switches regulating the activity of our genes”. So let’s untangle this a bit. It is true that the paper describes millions of sequences bound by transcription factors or prone to digestion by DNase. And it is true that many bona fide regulatory sequences will have these properties. But as even the authors admit, only some fraction of these sequence will actually turn out to be involved in gene regulation. So it is simply false to claim that the papers have identified millions of switches.

Ewan Birney, who lead the data analysis for the entire ENCODE project, wrote an excellent, measured post on the topic today in which he makes it clear that when they claim that 80% of the genome is “functional”, the are simply refers to its having biochemical activity. And yet even his quotes in the press release play a bit fast and loose with this issue, repeating the millions of switches line. Surely it’s a sign of a toxic process when people let themselves be quoted saying something they don’t really believe.

The end result is some fairly disastrous coverage of the project in the popular press. Gina Kolata’s story on the topic in the New York Times is, sadly, riddled with mistakes. It’s commonplace amongst scientists to blame this kind of thing on reporters not knowing what they’re talking about. But in this case at least the central problems with her story trace directly back to the misleading way in which the results were presented by the authors’.

The NYT piece is titled “Bits of Mystery DNA, Far From ‘Junk,’ Play Crucial Role” (wonder where they got that idea), and goes on to herald the “major medical and scientific breakthrough” that:

the human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave

This is complete crap. Yet it’s nothing more than a paraphrasing of the line the ENCODE team were promoting. Same thing with a statement later on that “At least 80 percent of this [junk] DNA is active and needed.” You can blame the reporter if you want for incorrectly mixing in the “needed” part there, which is not something the studies asserted. But this is actually a perfectly logical conclusion to reach from the 80% functional angle the authors were pitching.

I don’t mean to pick too harshly on the ENCODE team here. They didn’t invent the science paper PR machine, nor are they the first to traffic in various levels of misrepresentation to make their story seem sexier to journals and the press. But today’s activities may represent the apotheosis of the form. And it’s too bad – whatever one thinks about the wisdom of the whole endeavor, ENCODE has produced a tremendous amount of data, and the both the research community and interested public would have benefited from a more sober and realistic representation of what the project did and did not accomplish.

This 100,000 word post on the ENCODE media bonanza will cure cancer

27 Comments