Thoughts on Ron Vale’s ‘Accelerating Scientific Publication in Biology’

Ron Vale has posted a really interesting piece on BioRxiv arguing for changes in scientific publishing. The piece is part data analysis, examining differences in publishing in several journals and among UCSF graduate students from 1980 to today, and part perspective, calling for the adoption of a culture of “pre-prints” in biology, and the expanded use of short-format research articles.

He starts with three observations:

  • Growth in the number of scientists has increased competition for spots in high-profile journals over time, and has led these journals to demand more and more “mature” stories from authors.
  • The increased importance of these journals in shaping careers leads authors to try to meet these demands.
  • The desire of authors to produce more mature stories has increased the time spent in graduate and postdoctoral training, and has diminished the efficacy of this training, while slowing the spread of new ideas and data.

He offers up some data to support these observations:

  • Biology papers published in Cell, Nature and JCB in 2014 had considerably more data (measured by counting the number of figure panels they have) than in 1984.
  • Over the same period, the average time to first publication for UCSF graduate students has increased from 4.7 years to 6.0 years, the number of first author papers they have has decreased, and the total time they spend in graduate school has increased.

And he concludes by offering some solutions:

  • Encourage scientists to publish all of their papers in pre-print servers.
  • Create a “key findings” form of publication that would allow for the publication of single pieces of data.

Vale has put his finger on an important problem. The process of publication has far too great an influence on the way we do science, let alone communicate it. And it would be great if we all used preprint servers and strived to publish work faster and in a less mature form than we currently do. I am very, very supportive of Vale’s quest (indeed it has been mine for the past twenty years) – if it is successful, the benefits to science and society would be immense.

However, in the spirit of the free and open discussion of ideas that Vale hopes to rekindle, I should say that I didn’t completely buy the specific arguments and conclusions of this paper.

My first issue is that the essay misdiagnoses the problem. Yes, it is bad that we require too much data in papers, and that this slows down the communication of science and the progress of people’s careers. But this is a symptom of something more fundamental – the wildly disproportionate value we place on the title of the journal in which papers are published rather than on the quality of the data or its ultimate impact.

If you fixed this deeper problem by eliminating journals entirely and moving to a system of post-publication review, it would remove the perverse incentives that produce the effects Vale describes. However Vale proposes a far more modest solution – the use of pre-print servers. The odd thing with this proposal, as Vale admits, is that pre-print servers don’t actually solve the problem of needing a lot of data to get something published. It would be great for all sorts of reasons if every paper were made freely available online as early as possible – and I strongly support the push for the use of pre-print servers. But Vale’s proposal seem to assume that existing journal hierarchy would remain in place, and that most papers would ultimately be published in a journal. And this wouldn’t fundamentally alter the set of incentives to journals and authors that has led to problems Vale writes about. To do that you have to strip journals of the power to judge who is doing well in science – not just have them render that decision after articles are posted in a pre-print server. Unless the rules of the game are changed, with hiring, funding and promotion committees looking at quality instead of citation, universal adoption of pre-print servers will both be harder to achieve, and will have a limited effect on the culture of publishing.

Indeed, I would argue that we don’t need “pre-print” servers. What we need is to treat the act of posting your paper online in some kind of centralized server as the primary act of publication. Then it can be reviewed for technical merit, interest and importance starting at the moment it is “published” and continuing for as long as people find the paper worth reading.

Giving people credit for the impact their work has over the long-term would encourage them to publish important data quickly, and to fill in the story over time, rather than wait for a single “mature” paper. Similarly, rather than somewhat artificially create a new type of paper to publish “key findings” I think people will naturally write the kind of paper Vale wants if we change the incentives around publication by destroying the whole notion of “high-impact publications” and the toxic glamour culture that surrounds it.

Another concern I have about Vale’s essay is that he bases his argument for pre-print servers on a set of data analyses that, while I found them interesting, I didn’t find them compelling. I think I get what Vale’s doing. He wants to promote the use of pre-print servers, and realizes that there is a lot of resistance. So he is trying to provide data that will convince people that there are real problems in science publishing so that they will endorse his proposals. But by basing calls for change on data, there is the real risk that other people will also find the data less than compelling and will dismiss the Vale’s proposed solutions as unnecessary as a result, when in fact the things Vale proposes would be just as valuable even if all the data trends he cites weren’t true

So let’s delve into the data a bit. First, in an effort to test the widely held sentiment that the amount of data required for a paper has increased over time, he attempted to compare the amount of data contained in papers published in Cell, Nature and JCB during the first six months of 1984 and of 2014 (it’s not clear why he chose these three journals).

The first interesting observation is that the number of biology papers published in Nature has dropped slightly over thirty years, and the number of papers published in JCB has dropped in half (presumably as the result of increased competition from other journals). To quantify the amount of data a paper contained, Vale analyzed figures in each of the papers. The total number of figures per paper was largely unchanged (a product, he argues, of journal policies), but the number of subpanels in each figure went up dramatically – two to four-fold.

I am inclined to agree with him, but it is worth noting that there are several alternative explanations for these observations.

As Vale acknowledges, practices in data presentation could have changed, with things that used to be listed as “data not shown” may now be presented in figures. I would add that maybe the increase in figure complexity reflects the fact that it is far easier to make complex figures now than it was in 1984. For example, when I did my graduate work in the early 1990’s it was very difficult to make figures showing aspects of protein structure. Now it is simple. Authors may simply be more inclined to make relatively minor points in a figure panel now because it’s easier.

A glance at any of these journals will also tell you that the complexity of figures varies a lot from field to field. Developmental biologists, for example, seem to love figures with ten or twenty subpanels. Maybe Cell, Nature and JCB are simply publishing more papers from fields where authors are inclined to use more complex figures.

Finally, the real issue Vale is addressing is not exactly the amount of data included in a paper, but rather the amount of data that had to be collected to get to the point of publishing a paper. It’s possible that authors don’t actually spend more time collecting data, but that they used to leave more data “in the drawer”.

The real point is that it’s really hard to answer the question of whether papers now contain more data than they used to. And it’s even harder to determine whether the amount of data required to get a paper published is more of less of an obstacle now than it was thirty years ago.

I understand why Vale did this analysis. His push to reform science publishing is based on a hypothesis – that the amount of data required to publish a paper has increased over time – and, as a good scientist, he didn’t want to leave this hypothesis untested. However, I would argue that differences between 1984 and today are irrelevant. Making it easier to publish work, and giving people incentives to publish their ideas and data earlier, is simply a good idea – and would be equally good even if papers published in 1984 required more data than they do today.

Vale goes on to speculate about why papers today require more data, and chalks it up primarily to the increased size of the biomedical research community, which has increased competition for coveted slots in high-ranking journals while it has also increased the desire for such publications, and that this has allowed journals to be even more selective and to put more demands on authors. (It’s really quite interesting that the number of papers in Cell, Nature and (I assume) Science has not increased in 30 years even as the community has grown).

This certainly seems plausible, but I wonder if it’s really true. I wonder if, instead, the increase in expectations of “mature” work have to do with the maturation of the fields in question. Nature has pretty broad coverage in biology (although it’s coverage is by no means uniform), but Cell and JCB both represent fields (molecular biology and cell biology) that were kind of in their infancies, or at least early adolescences, 30 years ago. And as fields mature, it seems quite natural for papers to include more data, and for journals to have higher expectations for what constitutes an important advance. You can see this happening over much shorter timeframes. Papers on the microbiome for example used to contain very little experimental data – often a few observations about the microbial diversity of some niche – but within just a few years, expectations for papers in the field have changed, with the papers getting far more data-dense. It would be interesting to repeat the kind of analysis Vale did, but to try and identify “new” fields (whatever that means), and see whether fields that were “new” in 2014 have papers of similar complexity to “new” fields in 1984.

The second bit of data Vale produced is on the relationship between publications and the amount of time spent in graduate school. Using data from UCSF’s graduate program, he found that current graduate students “published fewer first/second author papers and published much less frequently in the three most prestigious journals.” The average time to a first author papers for UCSF students in the 80’s was 4.7 years, and now it is 6.0. And the number of students with Science, Nature or Cell papers has fallen in half.

Again, one could pick this analysis apart a bit. Even if you accept the bogus notion that SNC publications are some kind of measure of quality, there are more graduate students both in the US and elsewhere, but the number of slots in those journals has remained steady. Even if criteria for publication were unchanged over time, one would have expected the number of SNC papers for UCSF graduate students to have gone down simply because of increased competition. If SNC papers are what these students aspire to (which is probably sadly largely true) then it makes sense that they would spend more time trying to make better papers that will get into these journals. It’s not clear to me that this requires that papers have more data, but rather than they have better data. But either way, once could look at this and argue that the problem isn’t that we need new ways of publishing, but rather that we need to stop encouraging students to put their papers into SNC. I suspect that all of the trends Vale measures here would be reversed if UCSF faculty encouraged all of their graduate students to publish all of their papers in PLOS ONE.

One could also argue that the trends reflect not a shift in publishing, but rather a degradation in the way we train graduate students. In my experience most graduate student papers reflect data that was collected in the year preceding publication. Maybe UCSF faculty, distracted perhaps by grant writing, aren’t getting students to the point where they do the important, incisive experiments that lead to publication until their fifth year, instead of their fourth.

And again, while the increased time to first publication has increased dramatically in the last 30 years, it’s hard to point to 1984 as some kind of Golden Age. That typical students back then weren’t publishing at all until the end of their fifth year in graduate school is still bad.

So, in conclusion, I think there is a lot to like in this essay. Without explicitly making this point, the observations, data and discussion Vale present make a compelling case that publishing is having a negative impact on the way we do science and the way we train the next generation. I have some issues with the way he has framed the argument and the degree of conservativeness in his solutions. But I think Vale has made an important contribution to the now decades old fight to reform science publishing, and we would all be better off if we heeded his advice.

 

This entry was posted in open access, publishing, science. Bookmark the permalink. Both comments and trackbacks are currently closed.

7 Comments

  1. Roman
    Posted July 16, 2015 at 1:37 am | Permalink

    I know that these points have been belaboured for a very long time in different venues. Yet, I cannot help but to point out that for as long as the job and grant application filtering is based on the publication list, encouraging students to send all their papers to PLOS ONE is actually a bad advice. This is a symptome rather than the root cause. We should encourage grant reviewers and employers to pay more attention to the research itself rather than journals where it was published. But will they listen? It is much easier to evaluate the publications by journal names rather than merit.

    By the way, there is also a natural tendency to milk your data to death before releasing it into the wild. Otherwise, others may scoop you. That might be another reason why papers are getting bigger and bigger. People want to take full credit without the risk.

  2. Qaz
    Posted July 16, 2015 at 11:35 pm | Permalink

    When measuring time to first publication, I would really like to see a measure of time to first submission. In my observation, even over the last twenty five years, the time between submission and publication has increased tremendously. In my observation, this has two components. One is that reviewers now demand more in each paper, whether it be more analyses, more data, or more figures. The other is that because there is more prestige in non-field-specific journals than there used to be (i.e. Science and Nature over JNeurophys), and publishing in those non-field-specific journals is more important to career than it used to be, and it is harder to get into those journals than it used to be, people spend years “working their way down” the journal hierarchy.

    How much of the excess time to publication is pre-submission and how much time is post-submission?

  3. Posted July 28, 2015 at 2:13 am | Permalink

    Although I agree with your point that researchers should be encouraged to publish important results quickly and then write a detailed paper, most funders and institutions consider the number of publications and the journals they are published in to determine a researcher’s worth. Researchers are, thus, forced to spend more time on a single study and its publication. Apart from this, the reason behind researchers taking more time to get published and the presence of more data in papers now than before could possibly be because reviewers demand more experiments/data to be included in papers (http://www.editage.com/insights/is-reviewers-demand-for-more-experiments-justified). You’ve rightly pointed out that “The process of publication has far too great an influence on the way we do science.” Most researchers would agree that the publication process is demanding enough to hinder their involvement in other academic endeavors.

  4. binay panda
    Posted August 3, 2015 at 4:02 pm | Permalink

    i am surprised that this post has not attracted much discussion, which i am a bit sad about. i have responded to ron’s original article in biorxiv but will reproduce my comments below, which also appear on my blogpost (http://ow.ly/Qp852).

    Absolutely a great topic and timely discussion. Posts from senior investigators and established scientists like Ron will help change the system and I thank Ron to initiate this, especially with a hope that things will change in India. I can’t agree with Michael Eisen more that the treatment of the current system is symptomatic and will not yield a lasting solution. What we essentially need is a durable solution. Why even care publishing in any journal? Really, guys, we live in an era of Internet. Why not we put all our results in an open domain and allow it to be reviewed by as many people as possible? I understand that this is probably impossible to digest by a large number of biologists but this is what’s needed. Why take hostage by 2 or 3 reviewers only, selected by any journal editors (and at best using some opaque means)? Scientific practice of yesteryears must stop and stop immediately. We must really embrace a level playing field. In a country like india, where >550 million people are under the age of 25yrs, why should we restrict the young people, for that matter anyone, to think and/or practice what was there 50yrs back? Why should we teach them that publishing in any Journal, least in Science, Nature or Cell, is important? Why not to leap forward with means that are already available to us? Why re-invent the wheel what folks in England did nearly four of centuries back with the beginning of the “Philosophical Transactions of the Royal Society” or in the USA about half a century back by doing rigorous science and sending the results from their best science to “so called” top journals? Why can’t we just put everything out there in the Internet and let anyone and everyone to judge? What’s wrong about it? The claim that “crap science will creep in without pre-publication peer-review” is a bogus argument at best. Take the example of cancer biology where only 11% of the results from the published scientific findings could be confirmed anyway (http://ow.ly/Qp4na). Even more shocking is the scale of economic loss due to this irreproducible science. A finding published in a recent study in PLoS Biology (http://ow.ly/Qp4J1) confirms this, paving ways for reproducibility projects (http://ow.ly/Qp634). Lack of reproducibility in the scientific findings costs a whopping sum to the taxpayers. A quote from the PLoS Biology article’s abstract “An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States” proves the point. if this is not enough to stay away from the bad practice of the current pre-publication peer review system, let me give another example. As Ron points out, it’s a great idea to start pre-print server with open post-publication review system as in f1000. the best i have seen recently, that gives readers a platform to discuss data post publication, is on the mouse encode project where it was originally suggested (http://ow.ly/Qp4ZQ) that gene expression data cluster more by species rather than by tissue. The mouse encode project data was reanalyzed by Yoav Gilad and Orna Mizrahi-Man and published in an f1000 article (http://ow.ly/Qp4QO). whether the original study or the re-analysis is convincing is for the readers to figure out after going over the data and evidence presented but how can one argue against such a lively, and productive post-publication review system?

    Before giving out research grant, you need to read one’s proposal anyway. Therefore, before judging the quality of the work, we can read what’s out their before taking a call. A lot of chaff can be separated this way, essentially, arguing in favor of getting rid of the current system of pre-publication review system linked to the tiered journals for dissemination of scientific information to the wider audience.

    For this to happen, the publication system in practice today needs to take an exit. The way the current system of giving away scientific credit is not just unjustified, I would argue is feudal. Better we get rid of it completely.

    Binay Panda

    p.s.: this may be a duplicate comment as my earlier comment didn’t go through

  5. binay panda
    Posted August 3, 2015 at 4:07 pm | Permalink

    michael, i tried posting my comment two times but it didn’t go through? is there a word limit to the comment? binay panda

  6. binay panda
    Posted August 3, 2015 at 4:11 pm | Permalink

    i am surprised that this post has not attracted much discussion, which i am a bit sad about. i have responded to ron’s original article in biorxiv but will reproduce my comments below, which also appear on my blogpost (http://ow.ly/Qp852).
    Absolutely a great topic and timely discussion. Posts from senior investigators and established scientists like Ron will help change the system and I thank Ron to initiate this, especially with a hope that things will change in India. I can’t agree with Michael Eisen more that the treatment of the current system is symptomatic and will not yield a lasting solution. What we essentially need is a durable solution. Why even care publishing in any journal? Really, guys, we live in an era of Internet. Why not we put all our results in an open domain and allow it to be reviewed by as many people as possible? I understand that this is probably impossible to digest by a large number of biologists but this is what’s needed. Why take hostage by 2 or 3 reviewers only, selected by any journal editors (and at best using some opaque means)? Scientific practice of yesteryears must stop and stop immediately. We must really embrace a level playing field. In a country like india, where >550 million people are under the age of 25yrs, why should we restrict the young people, for that matter anyone, to think and/or practice what was there 50yrs back? Why should we teach them that publishing in any Journal, least in Science, Nature or Cell, is important? Why not to leap forward with means that are already available to us? Why re-invent the wheel what folks in England did nearly four of centuries back with the beginning of the “Philosophical Transactions of the Royal Society” or in the USA about half a century back by doing rigorous science and sending the results from their best science to “so called” top journals? Why can’t we just put everything out there in the Internet and let anyone and everyone to judge? What’s wrong about it? The claim that “crap science will creep in without pre-publication peer-review” is a bogus argument at best. Take the example of cancer biology where only 11% of the results from the published scientific findings could be confirmed anyway (http://ow.ly/Qp4na). Even more shocking is the scale of economic loss due to this irreproducible science. A finding published in a recent study in PLoS Biology (http://ow.ly/Qp4J1) confirms this, paving ways for reproducibility projects (http://ow.ly/Qp634). Lack of reproducibility in the scientific findings costs a whopping sum to the taxpayers. A quote from the PLoS Biology article’s abstract “An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States” proves the point. if this is not enough to stay away from the bad practice of the current pre-publication peer review system, let me give another example. As Ron points out, it’s a great idea to start pre-print server with open post-publication review system as in f1000. the best i have seen recently, that gives readers a platform to discuss data post publication, is on the mouse encode project where it was originally suggested (http://ow.ly/Qp4ZQ) that gene expression data cluster more by species rather than by tissue. The mouse encode project data was reanalyzed by Yoav Gilad and Orna Mizrahi-Man and published in an f1000 article (http://ow.ly/Qp4QO). whether the original study or the re-analysis is convincing is for the readers to figure out after going over the data and evidence presented but how can one argue against such a lively, and productive post-publication review system?
    Before giving out research grant, you need to read one’s proposal anyway. Therefore, before judging the quality of the work, we can read what’s out their before taking a call. A lot of chaff can be separated this way, essentially, arguing in favor of getting rid of the current system of pre-publication review system linked to the tiered journals for dissemination of scientific information to the wider audience.
    For this to happen, the publication system in practice today needs to take an exit. The way the current system of giving away scientific credit is not just unjustified, I would argue is feudal. Better we get rid of it completely.

    Binay Panda

    p.s.: this may be a duplicate comment as my earlier one didn’t go through

  7. binay panda
    Posted August 4, 2015 at 7:08 pm | Permalink

    i posted a response to ron vale’s article in arxiv. my response to it is http://ow.ly/Qv7wd

One Trackback

  • By Pre-prints: just do it? | Reciprocal Space on July 17, 2015 at 5:04 am

    […] Eisen has written a sympathetic critique of Vale’s paper. He takes some issue with the particulars of the arguments about increased data requirements but […]