Thoughts on Ron Vale’s ‘Accelerating Scientific Publication in Biology’

Ron Vale has posted a really interesting piece on BioRxiv arguing for changes in scientific publishing. The piece is part data analysis, examining differences in publishing in several journals and among UCSF graduate students from 1980 to today, and part perspective, calling for the adoption of a culture of “pre-prints” in biology, and the expanded use of short-format research articles.

He starts with three observations:

Growth in the number of scientists has increased competition for spots in high-profile journals over time, and has led these journals to demand more and more “mature” stories from authors.
The increased importance of these journals in shaping careers leads authors to try to meet these demands.
The desire of authors to produce more mature stories has increased the time spent in graduate and postdoctoral training, and has diminished the efficacy of this training, while slowing the spread of new ideas and data.

He offers up some data to support these observations:

Biology papers published in Cell, Nature and JCB in 2014 had considerably more data (measured by counting the number of figure panels they have) than in 1984.
Over the same period, the average time to first publication for UCSF graduate students has increased from 4.7 years to 6.0 years, the number of first author papers they have has decreased, and the total time they spend in graduate school has increased.

And he concludes by offering some solutions:

Encourage scientists to publish all of their papers in pre-print servers.
Create a “key findings” form of publication that would allow for the publication of single pieces of data.

Vale has put his finger on an important problem. The process of publication has far too great an influence on the way we do science, let alone communicate it. And it would be great if we all used preprint servers and strived to publish work faster and in a less mature form than we currently do. I am very, very supportive of Vale’s quest (indeed it has been mine for the past twenty years) – if it is successful, the benefits to science and society would be immense.

However, in the spirit of the free and open discussion of ideas that Vale hopes to rekindle, I should say that I didn’t completely buy the specific arguments and conclusions of this paper.

My first issue is that the essay misdiagnoses the problem. Yes, it is bad that we require too much data in papers, and that this slows down the communication of science and the progress of people’s careers. But this is a symptom of something more fundamental – the wildly disproportionate value we place on the title of the journal in which papers are published rather than on the quality of the data or its ultimate impact.

If you fixed this deeper problem by eliminating journals entirely and moving to a system of post-publication review, it would remove the perverse incentives that produce the effects Vale describes. However Vale proposes a far more modest solution – the use of pre-print servers. The odd thing with this proposal, as Vale admits, is that pre-print servers don’t actually solve the problem of needing a lot of data to get something published. It would be great for all sorts of reasons if every paper were made freely available online as early as possible – and I strongly support the push for the use of pre-print servers. But Vale’s proposal seem to assume that existing journal hierarchy would remain in place, and that most papers would ultimately be published in a journal. And this wouldn’t fundamentally alter the set of incentives to journals and authors that has led to problems Vale writes about. To do that you have to strip journals of the power to judge who is doing well in science – not just have them render that decision after articles are posted in a pre-print server. Unless the rules of the game are changed, with hiring, funding and promotion committees looking at quality instead of citation, universal adoption of pre-print servers will both be harder to achieve, and will have a limited effect on the culture of publishing.

Indeed, I would argue that we don’t need “pre-print” servers. What we need is to treat the act of posting your paper online in some kind of centralized server as the primary act of publication. Then it can be reviewed for technical merit, interest and importance starting at the moment it is “published” and continuing for as long as people find the paper worth reading.

Giving people credit for the impact their work has over the long-term would encourage them to publish important data quickly, and to fill in the story over time, rather than wait for a single “mature” paper. Similarly, rather than somewhat artificially create a new type of paper to publish “key findings” I think people will naturally write the kind of paper Vale wants if we change the incentives around publication by destroying the whole notion of “high-impact publications” and the toxic glamour culture that surrounds it.

Another concern I have about Vale’s essay is that he bases his argument for pre-print servers on a set of data analyses that, while I found them interesting, I didn’t find them compelling. I think I get what Vale’s doing. He wants to promote the use of pre-print servers, and realizes that there is a lot of resistance. So he is trying to provide data that will convince people that there are real problems in science publishing so that they will endorse his proposals. But by basing calls for change on data, there is the real risk that other people will also find the data less than compelling and will dismiss the Vale’s proposed solutions as unnecessary as a result, when in fact the things Vale proposes would be just as valuable even if all the data trends he cites weren’t true

So let’s delve into the data a bit. First, in an effort to test the widely held sentiment that the amount of data required for a paper has increased over time, he attempted to compare the amount of data contained in papers published in Cell, Nature and JCB during the first six months of 1984 and of 2014 (it’s not clear why he chose these three journals).

The first interesting observation is that the number of biology papers published in Nature has dropped slightly over thirty years, and the number of papers published in JCB has dropped in half (presumably as the result of increased competition from other journals). To quantify the amount of data a paper contained, Vale analyzed figures in each of the papers. The total number of figures per paper was largely unchanged (a product, he argues, of journal policies), but the number of subpanels in each figure went up dramatically – two to four-fold.

I am inclined to agree with him, but it is worth noting that there are several alternative explanations for these observations.

As Vale acknowledges, practices in data presentation could have changed, with things that used to be listed as “data not shown” may now be presented in figures. I would add that maybe the increase in figure complexity reflects the fact that it is far easier to make complex figures now than it was in 1984. For example, when I did my graduate work in the early 1990’s it was very difficult to make figures showing aspects of protein structure. Now it is simple. Authors may simply be more inclined to make relatively minor points in a figure panel now because it’s easier.

A glance at any of these journals will also tell you that the complexity of figures varies a lot from field to field. Developmental biologists, for example, seem to love figures with ten or twenty subpanels. Maybe Cell, Nature and JCB are simply publishing more papers from fields where authors are inclined to use more complex figures.

Finally, the real issue Vale is addressing is not exactly the amount of data included in a paper, but rather the amount of data that had to be collected to get to the point of publishing a paper. It’s possible that authors don’t actually spend more time collecting data, but that they used to leave more data “in the drawer”.

The real point is that it’s really hard to answer the question of whether papers now contain more data than they used to. And it’s even harder to determine whether the amount of data required to get a paper published is more of less of an obstacle now than it was thirty years ago.

I understand why Vale did this analysis. His push to reform science publishing is based on a hypothesis – that the amount of data required to publish a paper has increased over time – and, as a good scientist, he didn’t want to leave this hypothesis untested. However, I would argue that differences between 1984 and today are irrelevant. Making it easier to publish work, and giving people incentives to publish their ideas and data earlier, is simply a good idea – and would be equally good even if papers published in 1984 required more data than they do today.

Vale goes on to speculate about why papers today require more data, and chalks it up primarily to the increased size of the biomedical research community, which has increased competition for coveted slots in high-ranking journals while it has also increased the desire for such publications, and that this has allowed journals to be even more selective and to put more demands on authors. (It’s really quite interesting that the number of papers in Cell, Nature and (I assume) Science has not increased in 30 years even as the community has grown).

This certainly seems plausible, but I wonder if it’s really true. I wonder if, instead, the increase in expectations of “mature” work have to do with the maturation of the fields in question. Nature has pretty broad coverage in biology (although it’s coverage is by no means uniform), but Cell and JCB both represent fields (molecular biology and cell biology) that were kind of in their infancies, or at least early adolescences, 30 years ago. And as fields mature, it seems quite natural for papers to include more data, and for journals to have higher expectations for what constitutes an important advance. You can see this happening over much shorter timeframes. Papers on the microbiome for example used to contain very little experimental data – often a few observations about the microbial diversity of some niche – but within just a few years, expectations for papers in the field have changed, with the papers getting far more data-dense. It would be interesting to repeat the kind of analysis Vale did, but to try and identify “new” fields (whatever that means), and see whether fields that were “new” in 2014 have papers of similar complexity to “new” fields in 1984.

The second bit of data Vale produced is on the relationship between publications and the amount of time spent in graduate school. Using data from UCSF’s graduate program, he found that current graduate students “published fewer first/second author papers and published much less frequently in the three most prestigious journals.” The average time to a first author papers for UCSF students in the 80’s was 4.7 years, and now it is 6.0. And the number of students with Science, Nature or Cell papers has fallen in half.

Again, one could pick this analysis apart a bit. Even if you accept the bogus notion that SNC publications are some kind of measure of quality, there are more graduate students both in the US and elsewhere, but the number of slots in those journals has remained steady. Even if criteria for publication were unchanged over time, one would have expected the number of SNC papers for UCSF graduate students to have gone down simply because of increased competition. If SNC papers are what these students aspire to (which is probably sadly largely true) then it makes sense that they would spend more time trying to make better papers that will get into these journals. It’s not clear to me that this requires that papers have more data, but rather than they have better data. But either way, once could look at this and argue that the problem isn’t that we need new ways of publishing, but rather that we need to stop encouraging students to put their papers into SNC. I suspect that all of the trends Vale measures here would be reversed if UCSF faculty encouraged all of their graduate students to publish all of their papers in PLOS ONE.

One could also argue that the trends reflect not a shift in publishing, but rather a degradation in the way we train graduate students. In my experience most graduate student papers reflect data that was collected in the year preceding publication. Maybe UCSF faculty, distracted perhaps by grant writing, aren’t getting students to the point where they do the important, incisive experiments that lead to publication until their fifth year, instead of their fourth.

And again, while the increased time to first publication has increased dramatically in the last 30 years, it’s hard to point to 1984 as some kind of Golden Age. That typical students back then weren’t publishing at all until the end of their fifth year in graduate school is still bad.

So, in conclusion, I think there is a lot to like in this essay. Without explicitly making this point, the observations, data and discussion Vale present make a compelling case that publishing is having a negative impact on the way we do science and the way we train the next generation. I have some issues with the way he has framed the argument and the degree of conservativeness in his solutions. But I think Vale has made an important contribution to the now decades old fight to reform science publishing, and we would all be better off if we heeded his advice.

Thoughts on Ron Vale’s ‘Accelerating Scientific Publication in Biology’

7 Comments