Announcing The Batavia Open Genomic Data Licence

Prepublication release of genomic and other large-scale biological datasets is incredibly value to the research community. For the last decade big genome sequencing centers – backed by the NIH and other funders – have followed a set of principles outlined at a January 2003 meeting in Ft. Lauderdale sponsored by The Wellcome Trust. This so called “Ft. Lauderdale Agreement” outlined a set of “responsibilities” for funders, data producers and data users. Significantly, it reserves for data producers the right to first publication of analysis of their data, discouraging precisely the kind of prepublication use of data it is supposed to be encouraging. In practice it has also given data producers the power to create enormous consortia to analyze data they produce, effectively giving them disproportionate credit for the work of large communities. It’s a horrible policy that has significantly squelched the development of a robust genome analysis community that is independent of the big sequencing centers.

As my lab has started to produce more genome sequence data, I have pondered how best to release these data to the public. On the one hand, I don’t want to do anything like Ft. Lauderdale. I want to encourage people to use my data – without feeling any obligation to include me in their plans or publications, and placing no restrictions on when they can publish.

However, I feel strongly that people taking advantage of the free and open release of scientific data should reciprocate by ensuring that publications arising from the use of these data also be free and open. Therefore, I have created a new agreement – which I will start using to release sequencing data from my lab. To reinforce the open sharing principles which this sharing represents, the license requires users to publish any work arising from the use of the data in open access journals – that is it requires that publications arising from the use of openly shared data be themselves openly shared.

I decided to name the license after the Batavia Seamount, a subsurface feature in the Indian Ocean that is antipodal to Ft. Lauderdale, FL, to reflect failures of the “Ft. Lauderdale Agreement” for sharing of DNA sequence data that restricts users’ rights to publish public sequence data and fails to lead to the open sharing of publications arising from these data.

So here it is.

Batavia Open Genomic Data Licence

Written by Michael Eisen, June 2011

This license governs the sharing of research data. It facilitates the pre-publication release of data in a way that ensures data producers receive proper credit for their efforts, encourages the creative use of these data by third parties, and guarantees that publications arising from these data will be freely and openly available.

Specifically, the license grants the right to use the data for any legal purpose provided:

  • the source of the data is properly attributed in any publication or other form of dissemination (ATTRIBUTION)
  • any publication arising from the use of the work is distributed under the terms of the Creative Commons Attribution License (OPEN ACCESS PUBLICATION)
  • if the data are redistributed, they are covered by this license (PROPAGATION)

The license does not:

  • impose any embargo on when users can publish their results
  • require that users inform data producers of their plans for analyzing the data

