Pachter’s P-value Prize’s Post-Publication Peer-review Paradigm

Several weeks ago my Berkeley colleague Lior Pachter posted a challenge on his blog offering a prize for computing a p-value for a claim made in a 2004 Nature paper. While cheeky in its formulation, Pachter had an important point – he believed that a claim from this paper was based on faulty reasoning, and the p-value prize was a way of highlighting its deficiencies.

Although you might not expect the statistics behind a largely-forgotten claim from an 11 year old paper to attract significant attention, Pachter’s post has set of a remarkable discussion, with some 130 comments as of this writing, making it an incredibly interesting experiment in post-publication peer review. If you have time, you should read the post and the comments. They are many things, but above all they are educational – I learned more about how to analyze this kind of data, and about how people think about this kind of data, here than I have anywhere else.

And, as someone who believes that all peer review should be done post-publication, there’s also a lot we can learn from what’s happening on Pachter’s blog.

Pre vs Post Publication Peer Review

I would love to see the original reviews of this paper from Nature (maybe Manolis or Eric can post them), but it’s pretty clear that the 2 or 3 people who reviewed the paper either didn’t scrutinize the claim that is the subject of Pachter’s post, or they failed to recognize its flaws. In either case, the fact that such a claim got published in such a supposedly high-quality journal highlights one of the biggest lies in contemporary science: that pre-publication peer review serves to defend us from the publication of bad data, poor reasoning and incorrect statements.

After all, it’s not like this is an isolated example. One of the reasons that this post generated so much activity was that it touched a raw nerve among people in the evolutionary biology community who see this kind of thing – poor reasoning leading to exaggerated or incorrect claims – routinely in the scientific literature, including (or especially) in the journals that supposedly represent the best of the best in contemporary science (Science, for example, has had a string of high-profile papers that turned out to be completely bogus in recent years – c.f. arsenic DNA).

When discussing these failures, it’s common to blame the reviewers and editors. But they are far less the fault of the people involved than they are an intrinsic problem with pre-publication. Pre-publication review is carried out under severe time pressure by whomever the editors managed to get to agree to review the paper – and this is rarely the people who are most interested in the paper or the most-qualified to review it. Furthermore, journals like Nature, while surely interested in the accuracy of the science they publish, also ask reviewers to assess its significance, something that at best distracts from assessing the rigor of a work, and often is in conflict with it. Most reviewers take their job very seriously, bit it simply impossible for 2 or 3 somewhat randomly chosen people who read a paper at a fixed point in time and think about it for a few hours to identify and correct all of its flaws.

However – and this is the crux of the matter for me – despite the fact that pre-publication peer review simply can not live up to the task it is assigned, we pretend that it does. We not only promulgate the lie to the press and public that “peer reviewed” means “accurate and reliable”, we act like it is true ourselves. Despite the fact that an important claim in this paper is – as the discussion on the blog has pointed out – clearly wrong, there is no effective way to make this known to readers of the paper, who are unlikely to stumble across Pachter’s blog while reading Nature (although I posted a link to the discussion on PubMed Commons, which people will see if they find the paper when searching in PubMed). Worse, even though the analyses presented on the blog call into question one of the headline claims that got the paper into Nature in the first place, the paper will remain a Nature paper forever – its significance on the authors CVs unaffected by this reanalysis.

Imagine if there had been a more robust system for and tradition of post-publication peer review at the time this paper was published. Many people (including one of my graduate students) saw the flaws in this analysis immediately, and sent comments to Nature – the only visible form of post-publication review at the time. But they weren’t published, and concerns about this analysis would not be resurfaced for over a decade.

The comments on the blog are not trivial to digest. There are many threads, and the comments include those that are thorough and insightful with others that are jejune and puerile. But if you read even part of the thread you come away with a far deeper understanding of the paper, what it found and what aspects of it are right and wrong than you get from the paper itself. THIS is what peer review should look like – people who have chosen to read a paper spending time not only to record their impressions once, but to discuss it with a collection of equally interested colleagues to try and arrive at a better understanding of the truth.

The system is far from perfect. but from now on anytime I’m asked what I mean by post-publication peer review, I’ll point them to Lior’s blog.

One important question is why doesn’t this happen more often? A lot of people had clearly formed strong opinions about the Lander and Kellis paper long before Lior’s post went up. But they hadn’t shared them. Does someone have to write a pointed blog post every time they want to inspire its results to be reexamined by the community?

The problem is, obviously, that we simply don’t have a culture of doing this kind of thing. We all read papers all the time, but rarely share our thoughts with anyone outside of our immediate scientific world. Part of this is technological – there really isn’t a simple system tied to the literature on which we can all post comments on papers that we have read with the hope that someone else will see them. PubMed Commons is trying to do this, but not everyone has access. And other than they the systems are just not that good yet. But this will change. The bigger challenge is getting people to use it once good technology for post-publication peer review.

Developing a culture of post publication peer review

The biggest challenge is that this kind of reanalysis of published work just isn’t done – there simply is not a culture of post-publication peer review. We lack any incentives to push people to review papers when they read them and have opinions that they feel are worth sharing. Indeed, we have a variety of counterincentives. A lot of people ask me if Lior is nuts for criticizing other people’s work so publicly. To many scientists this “just isn’t done”. But the question we should be asking is not “Why does Lior do this?” but rather “Why don’t we all?”.

When we read a paper and recognize something bad or good about it, we should look at it as a duty to share it with our colleagues. This is what science is all about. Oddly, we feel responsible enough for the integrity of the scientific literature that we are willing to review papers that often do not interest us and which we would not have otherwise read, yet we don’t feel that way about the more important process of thinking about papers after they are published. Somehow we have to transfer this sense of responsibility from pre- to post- publication review.

An important aspect of this is credit. A good review is a creative intellectual work and should be treated as such. If people got some kind of credit for post-publication reviews, more people would be inclined to do them. There are lots of ideas out there for how to create currencies for comment, but I don’t really think this is something that can be easily engineered – it’s going to have to evolve organically as (I hope) more people engage in this kind of commentary. But it is worth noting that Lior has, arguably, achieved more notice for his blog, which is primarily a series of post-publication reviews, than he has for his science. Obviously this is not immediately convertible classical academic credit, but establishing a widespread reputation for the specific kind of intellectualism manifested on his blog, can not but help Lior’s academic standing. I hope that his blog inspires people to do the same.

Of course not everybody is a fan of Lior’s blog. Several people who I deeply respect have complained that his posts are too personal, and that they inspire a kind of mob mentality in comments in which the scientists whose work he writes about become targets. I don’t agree with the first concern, but do think there’s something to the second.

So long as we personalize our scientific achievements, attacks on them are going to feel personal. I know that every time I receive a negative review of a paper or grant, I feel like it is a personal attack. Of course I know that this generally isn’t true, and I subscribe to belief that the greatest respect you can show another scientist is to tell them when you think they’ve made a mistake or done something stupid. But, nonetheless, negative feedback still feels personal. And it inspires in most of us an instinctive desire to defend our work – and therefore our selves – from these “attacks”. I think the reason people feel like Lior’s blogs are attacks is that they put themselves into the shoes of the authors he is criticizing and feel attacked. But I think this is something we have to get over as scientists. If the critique is wrong, than by all means we should defend ourselves, but conversely we should be able to admit when we were wrong, to have a good discussion about what to do next, and move on, all the wiser for it.

However, as much as I would like us all to be thick skinned scholars about to take it and dish it out, reality is that this is not the case. Even when the comments are civil, I can see how having a few dozen people shredding your work publicly could make even the most thick skinned scientist feel like shit. And if the authors of the paper had not been famous, tenured scientist at MIT, the fear of negative ramifications from such a discussion could be terrifying. I don’t think this concern should lead to people feeling reluctant to jump into scientific discussions – even when they are critical of a particular work – but I do think we should exercise extreme care in how we say things. And rule #1 has to be to restrict comments to the science and not the authors. In this regard, I was probably one of the worse offenders in this case – jumping from a criticism of the analysis to a criticism of the authors’ response to the critique. I know them both personally and felt they would know my comments were in the spirit of advancing the conversation, but that’s not a good excuse. I will be very careful not to do that in the future under any circumstances.

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.

12 Comments

  1. Posted June 8, 2015 at 1:51 pm | Permalink

    Mike, great post – and I think the bit at the end is in some ways the most important. There are other ways to go about this, too; I have made the decision to openly sign my paper peer reviews and this has led me to be both more careful and more polite in my reviews, to the point where I feel comfortable posting them publicly once the paper is out.

    Two additional thoughts —

    I absolutely don’t want to see a centralized commenting system come into being for all sorts of reasons; I think we need something sensibly federated. To that end, you might be interested in Chris Lee’s “selected papers” network idea, http://journal.frontiersin.org/article/10.3389/fncom.2012.00001/abstract
    as a way to actually do pre-“pub” peer review in a minimally sensible way.

    Second, there are annotation platforms like hypothes.is that I’d love to see applied to this general question of how to (technically) do post-pub peer review. See https://hypothes.is/. Any thoughts as to suitability?

  2. Manolis Dermitzakis
    Posted June 8, 2015 at 9:57 pm | Permalink

    Mike this is a great post! Thanks for also discussing the spirit in which we should make comments, which I think is fundamental to making PPR part of our culture.

  3. @Darioumma
    Posted June 9, 2015 at 5:27 am | Permalink

    Very nice post, thanks. You say that “there really isn’t a simple system tied to the literature on which we can all post comments on papers that we have read with the hope that someone else will see them”. Something like this exists: Pubpeer https://pubpeer.com/

  4. Posted June 9, 2015 at 5:37 am | Permalink

    I agree that publication should be separated from the peer-review assessment and that the peer-review assessment should be much more transparent and part of an ongoing scientific discussion. I wish I could read the peer-reviews for any published paper. I see no good reason to hide the contents of the peer-reviews.

    I think one can easily find many much more compelling examples for the failures of the current system of pre-publication peer-review than the discussed paper. The main challenge in moving toward an effective post-publication peer-review is attracting attention to published preprints. Very few papers, if any, published by PeerJ, OpenScience or any other progressive platform allowing post-publication peer-review have enjoyed the attention that the Nature paper in question did. How do we increase the visibility of preprints to give them a chance to get post-publication peer-reviews?

  5. Posted June 9, 2015 at 6:36 am | Permalink

    I totally agree that post-publication peer review is critical for reforming the way we publish and evaluate science, and scientists. Learning how to critique one other productively is a general problem, relevant for pre-publication review of papers and grants, and in daily life in the lab. My group wrote up an article on how we conduct yearly planning meetings, where we outlined our guidelines for giving and receiving feedback. Though written for a different context, I think they’re relevant here. http://www.cell.com/action/showFullTextImages?pii=S1097-2765%2815%2900307-X

  6. K. VijayRaghavan
    Posted June 9, 2015 at 10:20 am | Permalink

    Great post. Generous, correct, firm and polite.
    I learnt much from the discussions on the MBL paper. Thanks

  7. Ian Holmes
    Posted June 9, 2015 at 10:55 am | Permalink

    I am curious how many computational biology papers have ever been retracted. I found one, the following PLoS CompBio paper which claimed that Bayesian phylogenetics did not work, a result that turned out to be due to a bug in a Perl script:

    http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030158

    I have not seen any papers that have been retracted due to fraud or deception, but I would be astonished if that sort of trickery hasn’t happened a lot. In particular, I don’t think you can trust any computational papers where the source code & data are not available free of charge (I actually find this far more problematic than closed access: it’s one thing to charge someone for a paper, it’s a far worse thing — IMO — to charge them EXTRA if they want to verify it).

  8. Posted June 9, 2015 at 12:08 pm | Permalink

    This is a fine post, and I agree with most of it. However, I’m going to pick a nit.

    “I know that every time I receive a negative review of a paper or grant, I feel like it is a personal attack. Of course I know that this generally isn’t true, and I subscribe to belief that the greatest respect you can show another scientist is to tell them when you think they’ve made a mistake or done something stupid.”

    That’s a lovely attitude. I hope you recognize, however, that it’s something of a luxury. It’s a luxury you can afford, because you’re a tenured professor. For grad students, postdocs, and ass profs, negative reviews do amount to personal attacks. Whether the reviewers intend them as such or not, they threaten the professional survival of the reviewees. This isn’t something I dwelt on as a grad student or postdoc, and as I’ve left academia and make my living from software, it’s no longer relevant to me personally. But I doubt any system of public peer review will succeed without taking it into account.

    Of course, for junior academics, it’s no more blessed to give than to receive negative reviews, if reviewees are senior academics and reviewers are identifiable. This too had better be accounted for in any system of public peer review.

    More generally, it’s my judgment that many of the maladies of academic publishing at present are greatly aggravated by the hypercompetition for professional survival that prevails across most of academia at present.

  9. Posted June 9, 2015 at 4:39 pm | Permalink

    True peer reviews always used to be ‘post-publication’.

    “The Original Purpose of Peer Review”

    http://www.homolog.us/blogs/blog/2015/06/09/the-original-purpose-of-peer-review/

  10. Posted June 10, 2015 at 1:12 am | Permalink

    It’s interesting that the software world has had to go through this kind of evolution but, being a much newer field than experimental science, advanced practitioners realised early on that putting code into production without review, and allowing programmers to keep their code to themselves, was very damaging of quality. So now we have techniques like code reviews, pair programming and source code control which tend to improve quality.

    With recent revelations about non-reproducibilty experimental scientists as a group have egg on their face, and this can only be removed by opening up both data and code to proper review and discussion. Pre-publication peer review is clearly not giving the reliability it should, and science is seen to have been staggering about blindly for years.

    This is a pity, given that science has a lot to offer us, including possibly the only way to ensure our long-term survival.

  11. Claudiu Bandea
    Posted June 16, 2015 at 3:15 pm | Permalink

    Hi Michael,

    You made a very strong case for the need to implement a stronger and open peer-review system (PR), as it is clear that the outstanding effort and contributions of thousands of reviewers cannot save the current outdated PR.

    It is time that the research and publications, which are funded by tens of billions of taxpayers dollars, are openly, timely and fully reviewed, and that the reviewers get credit for their work and contribution.

    I recently suggested instituting an open, timely, and comprehensive PR funded by a small percentage (e.g. 1%) of the research funds
    (https://liorpachter.wordpress.com/2015/06/09/i-was-wrong/#comment-4549).

    I would like to ask you and your readers if the proposal has merit, and if it does, what would be the best way to proceed. Thanks.

  12. Ian Holmes
    Posted June 19, 2015 at 10:56 am | Permalink

    [cross-posted from Lior’s blog]
    I would like to clarify a comment I made on Lior’s “I was wrong” post about compbio retractions, to make it clear that this comment was not directed at Kellis et al, but was a general remark in response to Lior’s post about computational biologists admitting when they are wrong. My observation is that compbio as a field seems to have relatively few retractions, and my curiosity is whether this impression is supported by data. I did not mean to imply that the KBL paper should be retracted.

    I followed this comment with a remark about releasing code. Again this is a belief that applies to all compbio work and not just KBL. In general I think compbio papers should post their code, for reasons of reproducibility. Verbal descriptions of code are almost always incomplete, and without a way to run the code itself (and ideally scrutinize it), I consider that a methods section is incomplete. I favor the Titus Brown approach of releasing the entire workflow.

    The KBL yeast paper in particular is one where I would like to see the code released, because that work described major advances in genefinding sensitivity & specificity by using the indel patterns as a signature (indels within ORFs are a multiple of 3 bases long). I think it is important to verify and reproduce this work, it is an area I care about (having been modeling indels and alignments for some time) and so I would like to see the KBL code.

    Speaking more generally to Lior Pachter’s criticisms of Manolis Kellis, in his “Network Nonsense” post Lior says “In academia the word fraudulent is usually reserved for outright forgery” and goes on to argue for a broader use of the word “given what appears to be deliberate hiding, twisting and torturing of the facts”. I believe this is unhelpful. The word “fraud” is reserved to mean forgery because forgery, properly, has very severe consequences. In other places Pachter has indicated that he wishes those consequences upon Kellis (saying that Kellis should lose his job, for example). But “hiding, twisting, and torturing the facts” is a subjective description of events, as (in fact) is “deception deliberately practiced in order to secure unfair or unlawful gain” (a quote Lior does not source, but which Google attributes to the American Heritage Dictionary). The fact is that compbio is a hype-rich field, and few (including myself) would escape the charge of using at least some exaggeration or hype to describe their work, a practice which I agree is deplorable and could easily be characterized by someone more rigorous as “deception deliberately practiced in order to secure unfair gain”. Because, after all, we all deceive ourselves first, to some extent; don’t we?

    The most serious accusation Lior makes is that Kellis et al replaced the text of a figure to subtly but significantly alter its meaning. I personally think that publicly inviting them to publicize this change as an erratum would be a more helpful approach than accusing them of fraud. Clearly Lior believed differently: he says that he thought long and hard before making this accusation, so one must assume he considered less draconian options than calling for Manolis to be fired as a fraud.

    I completely support Lior’s right to criticize specifics of Manolis’ work, using whatever theatrical stunts he chooses to draw attention to these criticisms, including prizes and hyperbole. I would indeed praise his work as a post-publication peer reviewer: I believe the criticisms are, to a greater or lesser extent, valid. Network deconvolution probably has more hidden data-dependent parameters than Feizi et al admitted at first (so do a lot of methods: compbio code is ridden with hidden parameters). The choice of models for homologous protein rates in the yeast genomes paper could have been broader. But these are not out of line with the sorts of distortions or vagaries that (unfortunately) occur very often in compbio.

    I would dearly like to see compbio become more self-critical. I think Lior’s blog is an important step in this direction, and a valuable experiment in post-publication peer review. I think that one take-home message of this experiment is that one should be careful of accusations like “fraud”, which (as Lior acknowledges) have more than one meaning: a dictionary meaning in common usage, and a far more precise meaning that is specific to scientific ethics. Conflating the two risks devaluing the latter, and muddling up valuable scientific discussion with ad-hominem criticism. Let us reserve the word “fraud” for outright forgery. There are other terms (bluster, hype, subjectivity, bias, lack of rigor) that better characterize what Lior is getting at.

    Lastly, I am trying to tread a fine line here. Unlike others I am not attacking Lior. He has broken new ground with this blog. Rhetoric and showmanship are an important part of what he has doing. Knowing him, I expect he will not back down from his accusations of fraud nor his demands for Manolis’ job. That’s up to him. I’m simply saying where I stand. Deliberate, outright, result-faking fraud is a very serious issue that rightly needs to be a line in the sand for all sciences. Hype, self-serving bias, and irreproducibility are major problems that confound bioinformatics; they need to be solved, but not by conflating them with fraud. Prizes, critiques, fierce hyperbole: all are fair game. I find Lior’s style entertaining, and his work is excellent, but I also need to mention that I am a great admirer of work that’s come from Manolis’ lab too. The phylogenomics methods with Matt Rasmussen spring to mind. Manolis has also participated in some amazing biological discoveries, such as those involving the Piwi-interacting RNAs. There are many, many more. So I would be very disappointed if his career were significantly negatively affected as the result of one figure change which could easily be published as an erratum, or because he defended a choice of null model.

2 Trackbacks

  • By I was wrong | Bits of DNA on June 9, 2015 at 1:10 am

    […] conversation topic that emerged as a result of the blog (mostly on other forums) is the role of style in online discussion of science. Specifically, the question of whether […]

  • […] talking about the value of ‘post-publication peer review’ (see comments here or at the Eisen’s blog). I find those comments misguided, because peer review was never meant to be similar to a professor […]