Announcing The Batavia Open Genomic Data Licence

Prepublication release of genomic and other large-scale biological datasets is incredibly value to the research community. For the last decade big genome sequencing centers – backed by the NIH and other funders – have followed a set of principles outlined at a January 2003 meeting in Ft. Lauderdale sponsored by The Wellcome Trust. This so called “Ft. Lauderdale Agreement” outlined a set of “responsibilities” for funders, data producers and data users. Significantly, it reserves for data producers the right to first publication of analysis of their data, discouraging precisely the kind of prepublication use of data it is supposed to be encouraging. In practice it has also given data producers the power to create enormous consortia to analyze data they produce, effectively giving them disproportionate credit for the work of large communities. It’s a horrible policy that has significantly squelched the development of a robust genome analysis community that is independent of the big sequencing centers.

As my lab has started to produce more genome sequence data, I have pondered how best to release these data to the public. On the one hand, I don’t want to do anything like Ft. Lauderdale. I want to encourage people to use my data – without feeling any obligation to include me in their plans or publications, and placing no restrictions on when they can publish.

However, I feel strongly that people taking advantage of the free and open release of scientific data should reciprocate by ensuring that publications arising from the use of these data also be free and open. Therefore, I have created a new agreement – which I will start using to release sequencing data from my lab. To reinforce the open sharing principles which this sharing represents, the license requires users to publish any work arising from the use of the data in open access journals – that is it requires that publications arising from the use of openly shared data be themselves openly shared.

I decided to name the license after the Batavia Seamount, a subsurface feature in the Indian Ocean that is antipodal to Ft. Lauderdale, FL, to reflect failures of the “Ft. Lauderdale Agreement” for sharing of DNA sequence data that restricts users’ rights to publish public sequence data and fails to lead to the open sharing of publications arising from these data.

So here it is.

Batavia Open Genomic Data Licence

Written by Michael Eisen, June 2011

This license governs the sharing of research data. It facilitates the pre-publication release of data in a way that ensures data producers receive proper credit for their efforts, encourages the creative use of these data by third parties, and guarantees that publications arising from these data will be freely and openly available.

Specifically, the license grants the right to use the data for any legal purpose provided:

  • the source of the data is properly attributed in any publication or other form of dissemination (ATTRIBUTION)
  • any publication arising from the use of the work is distributed under the terms of the Creative Commons Attribution License (OPEN ACCESS PUBLICATION)
  • if the data are redistributed, they are covered by this license (PROPAGATION)

The license does not:

  • impose any embargo on when users can publish their results
  • require that users inform data producers of their plans for analyzing the data

This entry was posted in open access, PLoS, science. Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted June 8, 2011 at 9:02 am | Permalink

    This is an inspired idea, similar to RMS’ original GPL. Which is why I would like to see critics denied opportunity to change the subject (e.g., from freedom to writing).

    I hope you will not take offense, but if you put this on a wiki, the little typographic errors that crept in (this cannot possibly top the list of your priorities as an HHMI investigator) could be fixed by other interested parties. I’m sure the WTC meant well in Ft. Lauderdale, but your effort shows that improvements are always possible.

    Thank you for publishing this to a wide audience. It’s what’s best for humanity.

  2. Shaun
    Posted June 8, 2011 at 11:30 am | Permalink


    I completely agree with all your critiques of the Ft Lauderdale Agreement, and as someone whose research has greatly benefitted from access to published genomic data, I believe that data should always be free and accessible.

    However, you’re proposing to replace a restrictive license with one that is just as restrictive. Even if it’s in support of a good cause (Open Access publishing), I find it disturbing that you would try to push your ideological viewpoint through a data license. To me, you might as well be saying “anyone who uses my data has to buy a Red Sox season ticket”. Don’t get me wrong, I’m in favor of Open Access (and I like the Red Sox), and I want to use your data, but I don’t want you telling me how to publish my research, nor do I recognize that you have any right to tell me what to do with data once you have publicly disseminated it.

    Genomic datasets should be released unconditionally, especially if any public funding was involved in their generation. It should be the same principle as the publicly-funded publications that have to be sent to PubMed Central. While publicly-funded research publications should be freely available to all, we can’t enforce that everyone citing our papers has to publish their work in an open access journal. Neither can you enforce the same condition on the use of your data.

    Couldn’t we all just agree on a core data license, where the only restriction is that of attribution?

  3. Posted June 8, 2011 at 12:42 pm | Permalink

    Dear Mike –

    I like the spirit of this very much and may use it in the future if you can clarify the following point. Does the Batavia license apply only for those wishing to access the data on a pre-publication basis? Say you release a data set under Batavia, then you publish on it, and then a year or so later someone comes along and wants to use the data. Would the Batavia license expire at the time of your publication or persist for the lifetime of the dataset? I can’t see how the latter could be enforced since making data available post-publication without restrictions is within the normal code of scientific ethics. It would be helpful if you could clarify how Batavia applies after publication of the dataset by the producer. Maybe I missed this above???

    Best regards,

  4. Michael Eisen
    Posted June 9, 2011 at 4:20 pm | Permalink


    I’m confused. You believe that “all genomic datasets should be released unconditionally” and support open access, but don’t see the obvious connection between these two acts?

    Believing that genomic data should be free and believing that all papers should be open access are corollaries of a belief that science in all its manifestations should be open. This license is an effort to promote both of these facets of openness simultaneously – not some random juxtaposition of two unconnected ideas. The point of this license is to say that “I am embracing scientific openness by sharing information with you, and asking nothing in return except that you embrace scientific openness by publishing your papers in an open access journal.” The approach is modeled on the “share alike” spirit of the Gnu Public License which requires people who build on GPL licensed code to make their code available under the GPL. The only difference here is that we are crossing domains from sequence data – where openness demands the ability to use data without an embargo – to publication – where openness demands open access publication.

    I also don’t see how this is even comparably restrictive to Ft. Lauderdale, which in effect prohibited people from doing and publishing science, and distorted credit given to people for their work. The restrictions in this license simply encourage people to release their data because they will be confident that they can be confident that their behaving like honorable scientists will be rewarded by others doing the same.

  5. Michael Eisen
    Posted June 9, 2011 at 4:22 pm | Permalink


    My intention was for this to apply to prepublication data release. Once work is published it should be effectively in the public domain and therefore beyond the reach of any kind of reach through – no matter how good.

    That said, I of course believe that all work should be published under a CC-BY or equivalent license, and would welcome any way to leverage my data to make that happen.

  6. Alan Ruttenberg
    Posted June 9, 2011 at 9:17 pm | Permalink

    Given that you offer this as a license and not a contract, doesn’t it presume that you have some rights related to data? What would those be, in your opinion?

  7. Michael Eisen
    Posted June 9, 2011 at 9:27 pm | Permalink


    You’re right. This should probably be called a contract. Although the real point is not to create a legally enforceable document, but to create norms/expectation for the use of the data – which is how science is mostly governed.

  8. Alan Ruttenberg
    Posted June 10, 2011 at 10:39 am | Permalink

    I agree on the goal, but am concerned with the method. There’s an awful lot of confusion about the legal status of data and so I’m inclined to be precise so as to not further muddy the waters. I also worry about losing credibility either by being seen as unknowledgeable about the law, or as knowledgable but trying to trick people.

    Why not label this as a norm, and work on developing strategies to encourage adoption of it. Some ideas that come to mind:

    – getting funders to mandate following the policy
    – getting journals to discourage publication that doesn’t abide by the policy
    – making the notice prominent on all of your web site, and as commentary in any of your papers

    We need to get better at getting such processes to work, at least because anyone who investigates will find that this license doesn’t have any force, and because we don’t want to get into drafting, interpreting, or enforcing contract.

One Trackback