Let’s make 2013 the year of legislative access on open access

Yesterday a bi-partisan group of legislatures – Rep. Doyle (D-PA), Rep. Lofgren (D-CA), Rep. Yoder (R-KS), Sen. Wyden (D-OR) and Sen. Cornyn (R-TX) – introduced legislation that would require federal agencies that fund scientific and medical research to make works they fund available to the public. This bill – known as the Fair Access to Science and Technology Research Act of 2013, or FASTR, is a better version of legislation introduced in previous Congresses.

FASTR shortens the acceptable delay from 12 months to 6 months (still 6 months longer than it should be, but headed in the right direction), and, very importantly, adds a requirement that the works be available for text mining and other forms of reuse. It’s not perfect, but it’s very good, and passage of this bill would be a significant milestone in the push for public access to the results of federally funded research.

Previous versions of this bill have gone nowhere, but this is the time. Supporters of open access in the US should contact their representatives in Washington and urge them to sign on as cosponsors of this bill and push for it to reach the House and Senate floor. And every month we should renew this pressure – I hereby declare the first Friday of every month #FASTRFriday (which we will celebrate today for February). Let’s keep the pressure on Congress and see this one through.

Public access legislation is also being introduced in Illinois, New York and California, and I will post updates when these bills are introduced.

Posted in Uncategorized | 3 Responses

My father, Aaron Swartz, and assigning blame for suicide

Twenty-six years ago, on February 7th, 1987, my father killed himself, and this day is always a complicated one for me.

Me and my father

It is something I have never talked or written about in public. But I am moved to say something this year because of the suicide of Aaron Swartz. My brother had the same reaction, and wrote eloquently about it (although, being a family that never talks about “things”, we didn’t talk about this with each other).

In the years since my father died, I have had friends, colleagues and mentors kill themselves. But none evoked memories of my father like Swartz, a person I knew only as a public figure. There was just something so hauntingly similar about their deaths. 

 My father was a scientist at the National Institutes of Health in Bethesda – one of the “yellow berets” who had joined the Public Health Service to fulfill his national service obligations during the Vietnam War. He worked there for my entire childhood, and always seemed to love his work. In the summer of 1986, the year after my freshman year in college, I worked in an NIH lab, and saw my dad during lunch breaks, and everything seemed fine.

When I came home for Thanksgiving, he was preoccupied – doing a lot of scribbling on yellow legal pads. At one point I asked him what he was doing, and he told me someone in his lab had been committing fraud, and he had finally “caught him”. I was too naive to realize just how big a deal this was, and I didn’t think much more about it. Christmas came and went, and I went back to school.

In the meantime, my father had reported the fraud, and a hearing was held on January 28th at which the scientist in question was supposed to, but did not appear. I don’t know what happened at this meeting, but somehow my father left feeling that he was under suspicion – something everyone involved knew he was not. But whatever happened, it set something off.

On February 3rd, I called home and my father answered, but didn’t seem interested in talking to me (which was very unusual) and handed the phone off to my sister. Then, on the morning of February 7th, I went out for a bike ride on a cold Boston winter day – which for me was the last thing I did as a child. When I got back my uncle was waiting in my dorm room to tell me.

The second I read about what had happened with Aaron Swartz, the parallels made me lurch. They both snapped under accusatory pressure. They both hung themselves when they were left alone. But it was more than that. They just seemed like such similar people to me. It’s hard for me to put my finger on exactly why I felt this way – one person I knew only as a child, the other I did not know at all. But they both seemed to possess a “too good for this world” innocence. Everyone describes Swartz exactly the way I remember my father – as a sweet person who was nice to everyone around him and just seemed to want to do good in the world.

And their deaths are also connected by anger. My father’s death broke me, and it took me a long time to recover. But when I did, I was angry. Angry at what the people at the NIH had done to him. Exactly the same way people are angry now at the prosecutors who hounded Swartz. I felt, for a long time, that the faceless people on that NIH committee had literally killed my father, just like so many people seem to think Carmen Ortiz killed Swartz. 

But, you know, it just isn’t true. My father and Swartz’s were wonderful people. They just turned out to be too fragile. Most people have ways of dealing with adversity – not all are healthy, not all are smooth, but we make it through. And for some reason, these two did not. I will never stop trying to figure out why my father responded to this particular stress in the way he did – and I know I will never actually understand it. But the NIH did not kill him, and the prosecutors did not kill Swartz. They killed themselves.

I don’t say this to let anyone off the hook – precisely the opposite. There was no excuse for the way the NIH treated my father – they treat any hint of fraud like a virus, and assume that anyone who came in contact with the person involved must be contaminated. And the way Swartz was prosecuted was nothing short of malignant.

But so many people writing about Swartz’s death imply that the actions of MIT and Carmen Ortiz were bad because Swartz killed himself – that somehow they crossed a line defined by the point at which they drive someone to suicide. But this is madness. What the NIH and the prosecutors did was wrong, and we have to learn how to correct these abuses even when their victims can take it. Nothing will ever change if we measure other people’s actions in units of suicides. 

Posted in Uncategorized | 18 Responses

Restructuring the NIH and its grant programs to ensure stable careers in science

It is an amazing time to do science, but an incredibly difficult time to be a scientist.

There is so much cool stuff going on. Everywhere I go – my lab, seminar visits, meetings, Twitter – there are biologists young and old are bursting with ideas, eager to take advantage of powerful new ways to observe, manipulate and understand the natural world.

But as palpable as the creative energy is, it is accompanied by an equally palpable sense of dread. We are in one of the worst periods of scientific funding I – and my more senior colleagues – can remember. People aren’t just worried about whether their next grant will get funded, they’re worried about whether a career in academic or public science is even viable (see Kate Clancy’s excellent post on the subject).

There seems to be a broad consensus among the leaders of our community, such as they are, that the solution is for Congress to give us (them) more money. I get emails or calls every few days urging me to contact my senators and representatives to urge them to increase the NIH budget. While I am, in the abstract, in favor of more money for research, if I were in Congress and Francis Collins came to me asking for more money, I’d say “I’m happy to bolster our support for scientific research, but we’re not giving you another single dime until you get your s**t together and stop using the taxpayer’s money to patch over bad decisions and bad policies.”

There are so many things wrong with the NIH today, I could write a book. It’s become an immense, bloated bureaucracy that’s lost sight of its central missions. If it were up to me I’d break it up – turning NIH intramural research into a stand alone entity and creating a separate Institute for Basic Biomedical Research charged allocating funds currently under NIH control to support outstanding and innovative research and to ensure that stable training and career paths exists for American scientists.

This latter issue is the one I want to focus on here. Despite all the challenges of the moment, a lot of outstanding work is getting funded. The problem is, outstanding science needs outstanding scientists. And a lot of outstanding scientists, especially young ones, are leaving academia, unwilling to spend their lives chasing – and in all likelihood not getting – grants.

If I were put in charge of this new institute (or the existing one for that matter) I would devote a large fraction of my budget (I think $10b a year would be a good start) to a “career” award program (not to be confused with the NSF’s CAREER awards).

I would put ~$1b into a pool for young investigator awards. These would be somewhat like current K-99s, in that they would primarily awarded to senior postdocs. These would provide modest startup funds and research support of ~$150k/year for six years – allowing researchers to establish their independent research programs without having to worry about grants. There would be a lot of these – on the order of 1,000 per year. These grants – which would be allocated on the basis of a “people not projects” review, and in all likelihood universities would compete to recruit soon to be independent scientists with these awards.

Recipients of these awards would be evaluated after five years in much the same way people go through tenure reviews today. The purpose of the review would be to assess the researchers contributions to the field and potential for further success. Some would fail to advance, others would be placed in to one of five tiers, representing annual support of between $100,000 (tier 1) and $500,000 (tier 5) – most would be in tier 2 or 3. Every three years research in the career system would be evaluated, with the result of an assessment of their work leading to then either staying in the same tier or moving up or down at most one tier. The total number of people in each tier would be fixed.

I will confess this idea was heavily influenced by the way European soccer leagues operate. At the end of every year, the top teams in each league are promoted to the next higher league, the bottom teams are relegated to a lower division. The system provides a clear opportunity for advancement, but buffers declines – people would only lose their funding after a prolonged period of poor performance, rather than precipitously as happens in the current system if grants do not get renewed.

I estimate that this would cost around $7b/year including overhead. The remaining $3b would support a pool of ~4,500 postdoctoral fellowships and ~12,000 graduate fellowships for trainees to work in career scientists labs. These numbers were meant to provide a pool of 1,500 rising faculty candidates and 2,000 new Ph.D.’s every year, my estimate of what it would take to continually replentish the system.

The $2b left would support a robust equipment grant program for career scientists, including core facilities at institutions with appropriate numbers of career researchers. If the powers that be decide we need more (or fewer) scientists, you scale the whole system by adding or subtracting slots in proportion to available funds.

The $10b was specifically meant not to take the entire NIH extramural budget, but to leave room to fund specific projects, especially high-risk/high reward ones from either career or other labs.

The main goals here are to separate the two crucial function of our granting systems: 1) to fund cutting edge science, and 2) to support a robust scientific infrastructure by providing stable careers to our successful scientists. As I’ve said before, (1) requires (2), but one of the most significant pathologies of our current system is that we mix the two together. In order to support their ongoing research operations, scientists are compelled to dream up “innovative” new projects that can sell in study sections, but often don’t make sense in the real world, while at the same time avoiding truly innovative projects for fear they will be penalized. If labs have a separate mechanism to ensure their financial stability, they will both have more bandwidth to dream up and implement new projects, and the freedom to aim for the stars without worrying they will end up on the street.

I’m sure there are a lot of things I haven’t thought about here, and countless details that need to be dealt with. And I’m equally sure that a lot of people will hate this proposal. But I wanted to put this on the table and open it up for discussion, because the one thing we can not do is nothing. We are dangerously close to losing a generation – or many generations – of scientists. Let’s figure out how not to let this happen.

====

Addenda: Commenter Jonathan below misunderstood the number of people who would be supported under this system. This was not meant to be an exclusive program. I based my numbers of ~1,000 PIs enter the system per year, with a steady state number probably around 15-20,000. This was a back of the envelope calculation taken from the current size of the NIH grantee and trainee pools. The idea was to stably support a pool of scientists roughly the same size as the current NIH grantee pool, with the PIs trading a more stable funding situation in exchange for lower average levels of support.

Posted in science | 39 Responses

How academia betrayed and continues to betray Aaron Swartz

As news spread last week that digital rights activist Aaron Swartz had killed himself ahead of a federal trial on charges that he illegally downloaded a large database of scholarly articles with the intent to freely disseminate its contents, thousands of academics began posting free copies of their work online, coalescing around the Twitter hashtag #pdftribute.

This was a touching tribute: a collective effort to complete the task Swartz had tried – and many people felt died trying – to accomplish himself. But it is a tragic irony that the only reason Swartz had to break the law to fulfill his quest to liberate human knowledge was that the same academic community that rose up to support his cause after he died had routinely betrayed it while he was alive.

The most obvious culprit was MIT, whose computer system Swartz used for his downloads. Their decision to make sharing journal articles a criminal matter is inexcusable. But their real betrayal was allowing these articles to fall into private hands in the first place.

Although most academic research is funded by the public, universities all but force their scholars to publish their results in journals that take ownership of the work and place it behind expensive pay walls.

Centuries ago, when printing and mailing paper journals was the most efficient way to disseminate new knowledge, a symbiotic relationship developed between scholars, who had ideas they wanted to share, and publishers, who had printing presses and the means to convey printed works to a wide audience. Transferring copyright to publishers, which protected their ability to recover costs and profit from their investment, was a reasonable price for authors to pay to further their disseminating mission.

But with the birth of the internet, scholars no longer needed publishers to distribute their work. As NYU’s Clay Shirky has noted, publishing went from being an industry to being a button.

Had the leaders of major research universities reacted to this technological transformation with any kind vision, Swartz’s dream of universal free access to the scholarly literature would now be a reality. But they did not. Rather than seize this opportunity to greatly facilitate research and education, both within and outside the academy, they chose instead to reify the status quo.

Instead of encouraging their faculty to make their work widely available, virtually all universities send the unmistakable message to current and aspiring faculty that success in their career depends on publishing in the most high profile place you can. Since the most prestigious journals are generally old, this edict has the effect of stifling innovation in scientific communication. While countless alternatives to the traditional model have arisen, academics in most fields are reluctant to embrace them, fearing that doing so would harm their career prospects.

It is hard to account for this abdication on a university’s basic mission to produce and disseminate knowledge as anything but institutional laziness, as universities essentially farm out responsibility for screening job and promotion candidates to journals.

Absurdly, as soon as the scholarly output of our universities is in the hands of publishers, they immediately buy it back, spending billions of scarce institutional dollars every year in subscription and licensing fees to provide access to students and faculty, but leaving everybody else out in the cold.

Posting our PDFs is all fine and good, but the real way to honor Aaron Swartz is to combat this pervasive institutional fecklessness and do everything in our power to make sure no papers ever end up behind pay walls again. We have to demand that our universities alter their policies to reward, rather than punish, free scholarly publishing, and that they stop cutting the checks that keep this immoral system afloat.

Above all else we need to enshrine the principle that the knowledge produced in the academy is a public good whose value is greatly diminished by turning it into private property. And maybe the next time someone shows up at a university wanting only to spread knowledge, instead of calling the cops, they’ll say “Great, how can we help?”

====

[Update: I modified the title to reflect the ongoing nature of the betrayal]

====

My related writing on science publishing:

What the UC “open access” policy should say

20 years of cowardice: the pathetic response of American universities to the crisis in scholarly publishing

The widely held notion that high-impact publications determine who gets academic jobs, grants and tenure is wrong. Stop using it as an excuse.

You are Elsevier: time to overcome our fears and kill subscription journals

Plagiarist or Puppet? US Rep. Carolyn Maloney’s reprehensible defense of Elsevier’s Research Works Act

Research Bought, Then Paid For

Peer review is f***ed up – let’s fix it

Posted in open access, PLoS, science | 24 Responses

Darwin’s Tangled Bank in verse

My daughter has to memorize a poem for a school performance, and asked me if I knew a good poem about nature. There are, of course, many good ones, but I really wanted her to have the most poetic thing ever written about nature – the last paragraph of Darwin’s Origin of Species – rendered in verse. So I gave it a try.

 

The Tangled Bank

Contemplate a tangled bank
Clothed with many kinds of plant
Insects and birds flitting about
Worms crawling through the damp

Reflect that these elaborate
And differently constructed forms
Have been produced by such a simple set
Of ever acting norms

Growth, reproduction and inheritance
Variation to transmit
Natural selection then leading to
Extinction of the less fit

From the war of nature
From famine and from death
Follow the most exalted species
To have ever drawn a breath

There is grandeur in this view of life
And its powers not yet gone
Having been originally breathed
Into a few forms or just one

From as simple a beginning
As could ever be resolved
Endless forms most beautiful
Are continuously evolved.

 

Here’s the original:

It is interesting to contemplate an entangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent on each other in so complex a manner, have all been produced by laws acting around us. These laws, taken in the largest sense, being Growth with Reproduction; inheritance which is almost implied by reproduction; Variability from the indirect and direct action of the external conditions of life, and from use and disuse; a Ratio of Increase so high as to lead to a Struggle for Life, and as a consequence to Natural Selection, entailing Divergence of Character and the Extinction of less-improved forms. Thus, from the war of nature, from famine and death, the most exalted object which we are capable of conceiving, namely, the production of the higher animals, directly follows. There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

 

Posted in Darwin, evolution | Tagged , , , | 9 Responses

Is the NIH a cult?

As many of you know, I spent a fair amount of time last month engaged in debates about the wisdom of California’s Proposition 37, which would have mandated the labeling of genetically modified foods. While many of these discussions were civil, one particularly energetic fellow accuse me of having been brainwashed by the “cult of the NIH” into believing that anything science does must be good.

At the time I just giggled. But his tweet stuck in my head. After the election I looked back on my twenty years as a scientist in the “NIH system”, and I began to see the signs. So I read about cults – about what differentiates them from normal run of the mill organization. And I started keeping score (on a 1-9 scale, of course, with 1 being the most cultish).

Charismatic Leader

Every cult has a charismatic Svengali-like leader at its helm,

obsessed with their own self-image,

who ingratiates themselves with powerful leaders to further the cult’s agenda,

demands everyone call him “The Director” and publishes books and other materials espousing a personal philosophy and inviting comparison to deities.

 

Score: 1

Isolated Compound

Cults always have a compound,

with high levels of security

where The Director and his minions sit ensconced in a grandiose “Building 1″

 

complete with easily recognized signs of Roman imperial dominance.

Score: 1

Membership

Cults commonly have an elaborate process for selecting new members to ensure that they are appropriate and will not cause undue trouble. These often involve length screening periods during which potential members undergo different types of hazing.

Aspirants for membership in the NIH are forced to go through a grueling initiation ritual in which they are forced to recite a set of personal “Aims” and explain how they will further the cult’s organization’s “mission”. These applications are reviewed by existing members who are locked in windowless rooms without food or drink for extended periods of time where the merits of aspiring members are dissected by their potential “peers” flown in expressly for this purpose from across the country. Many are rejected out of hand for a wide range of undisclosed failings. Those who survive this “triage” process are subjected to further scrutiny and, after extensive wrangling and wheeling and dealing under the supervision of a “program manager” assigned to prevent violence, every aspirant is given a score. The solemn receiving of the score and assessment of the “peers”, printed on ceremonial pink paper, is one of the most stressful moments in the life of an NIH aspirant. They then are forced to endure months of additional waiting while receiving little or no information or encouragement, until their application is reviewed again by a mysterious organization of elders known as Councils who select new members according to the needs of their cell – also known as an Institute.

Newly selected members – known to insiders as grantees – are immediately given an obscure code containing a unique identifier along with a signifier of their rank decipherable only by other members of the organization. I have deciphered some of them. “K99″ designates novices. Routine workers are known by “R01″. A select group of “pioneers” bear the mark “DP2″. Many aspire to be a local leader, known as a “P01″. And, in recent years, a new group of members known as “U01″‘s have emerged to carry out special mission at the express behest of “The Director”.

Score: 2

Recruitment

Cults often have an active process for recruiting new members, often by indoctrinating children and naive young adults who have the misfortune of finding themselves under the influence of existing members.

The NIH has several national indoctrination programs, but the most dangerous and effective is something known as the “Training Grant”. These NIH cells, found on most university campuses across the country and always led by an established “grantee”, prey on impressionable youths just out of college and eager to shed the structure of their parents’ worlds. The NIH takes them under its wing and gives them a generous personal stipend and a structured program of research and experimentation. They dangle the carrot of one day becoming a “grantee”, but they do not tell them about the lonely, grueling years to come, or that only a handful of them will actually make it to the point where they are even allowed to submit their first application for membership. By the time they are done with this program, most have drunk the NIH Kool Aid, and can think of nothing they want more than to become a grantee. And those who do not feel they have sunk too much of their time and energy into these first steps along the grantee path to give up.

Score: 3

Bait-and-switch

People who have broken free of cults often complain that they were initially given easy rewards within the organization – special quarters, access to leaders, choice of the best jobs – but that once they were in for a few years, these perks were not longer so easy to obtain.

The NIH has been known to give species status to first-time applicants – making it far easier for them to get selected than those already in the system, who are forced to endure a “renewal” process every 3-5 years during which time they have to justify their continued status in the organization (there are few things as tragic as an ex-grantee).

Score: 4

Separating members from family

Cults almost always work to separate members from friends and family who are not members of the organization – cutting them off from support and from the outside world.

The highly competitive review process engineered by the NIH forces aspiring and existing members to work all hours of the day, night and weekends, eschewing family and friends in the interest of furthering the NIH agenda and their own place within it. This often leads members to marry within the organization creating additional challenges, including something called a “two body problem”.

Score: 2

Pet projects

Many cult members end up subjugating up their own aspirations for the pet projects of the cult leader, which are often insanely grandiose and often lead to the financial ruin first of the members then of the organization itself.

Increasing amounts of money siphoned from public coffers by the NIH are going to pet projects of The Director.

Score: 1

Giving away Possessions

Cults are renowned for forcing their members to give away all of their possession.

The NIH has a “Public Access Policy” which forces members to give away the single most valuable thing they produce while members of the system. And now they threaten to expel grantees who do not obey.

Score: 2

 

Posted in NIH | 7 Responses

Prop 37 and the Right to Know Nothing

As we approach election day, my neighborhood in Berkeley has sprouted dozens of blue and orange yard signs supporting Proposition 37, which would require the labeling of genetically modified foods.

The “Right to Know” has become the rallying cry of the initiative’s backers, who meet any criticism of the initiative, its motivation or of the “science” used to back it with the same refrain: “We have the right to know what’s in our food!!”.

It is, of course, hard to argue that people should not have this right. I am a very strong supporter of consumer rights and of providing information, even if people use it stupidly. But, I have closely followed the debate over Prop 37, reading and listening to and occasionally arguing with its proponents. And I have been struck throughout by just how little backers of the initiative actually want to know anything.

The law would require the application of a catchall “Contains GMOs” label to any product containing any ingredient from a genetically modified plant, animal or microbe. This language reflects the belief of its backers that GMOs are intrinsically bad and deserve to be labeled – and avoided – en masse, no matter what modification they contain or towards what end they were produced. This is not a quest for knowledge – it is a an attempt to reify ignorance.

Sure, if you think, as some people do, that moving genes from one species to another is some kind of crime against nature that risks destroying life on Earth, a blanket prohibition against GMOs makes sense. But the bulk of Prop 37 supporters I have heard or spoken to express more rational concerns, primarily:

  1. The specific modifications in common GM crops – the production of insecticidal proteins or of genes for herbicide tolerance – make them unsafe for human consumption.
  2. Whether safe or unsafe for humans, GM crops encourage an industrialized monoculture approach to farming that is unsustainable and bad for the planet.
  3. GM technology is wielded by multinational conglomerates like Monsanto who have little regard for the public interest and produce GM crops solely to make more money, and who use intellectual property in their creations to squeeze farmers and increase their control over global agriculture.

Whether one agrees with these points or not – I disagree with 1, but agree with 2 and 3 to varying degrees – none of them apply uniformly to all GMOs.

If you’re worried that the GMOs you’re eating might kill you, then you should want to know what specific modification your food contains. I don’t think there is any harm in eating food containing the insecticidal “Bt” protein, but even if it were dangerous this would have no bearing on the safety of golden rice.

Similarly, if you are concerned that the transgenic production of plants resistant to certain herbicides encourages the excessive use of herbicides and triggers an herbicide treadmill, then you can boycott crops containing these modifications. But it doesn’t make sense to oppose the use of crops engineered to resist diseases, or to produce essential vitamins. Indeed, there are many, like UC Davis’s Pam Ronald, who believe that advanced development of GMOs is the best way to advance organic and sustainable agriculture. You may disagree with her, but it should be clear that the effect on agricultural practices varies depending on the specific plant and type of modification being considered.

And, while I share much of the disdain anti-GMO advocates feel for the business practices of companies like Monsanto, not every seed company uses the same practices, and there are plenty of academic researchers, non-profits and companies laboring to use GMOs to solve major challenges in global food production, distribution and nutrition. To hamper what they are doing in the name of sticking it to Monsanto – whose questionable business practices extend far beyond GMOs – makes no sense.

Thus the very reasons supporters of GMO labeling cite for labeling GMOs demand more information than “This product contains genetically modified ingredients”. And it’s the central irony of Prop 37 that in backing the bill they are, in tangible ways, working to ensure they do not get information that will be actually useful to them.

Some backers of Prop 37 say that it is the first step towards more comprehensive food labeling. If, in the push to pass the initiative I saw a thirst for real knowledge and understanding of where crops come from and how food is produced, then I’d share their optimism.

But everything I’ve seen from proponents of Prop 37 suggests something else – a lazy and self-satisfied acceptance of an internally incoherent piece of legislation that, rather than giving consumers the “right to know”, will actually protect their desire to know nothing.

Posted in GMO | 70 Responses

Science is healthy for children and other living things

Posted in GMO | 2 Responses

Retraction action, what’s your faction: the dangers of citation worship

If you ask scientists to list words they are most afraid to hear associated with their work, I suspect “retraction” would rank high on the list. Retraction is a kind of death sentence, applied only when papers contain serious methodological errors or were tainted by fraud.

So the recent retraction of a PLoS Pathogens paper linking the virus XMRV to prostate cancer, following a new PLoS ONE paper that demonstrated that the original results were due to contamination, caught many (including the authors of the original paper, many of whom were involved in the followup study) off guard. Martin Enserink at ScienceNOW and Retraction Watch have excellent posts with details on the story.

Before offering my thoughts on this, I want to state at the outset that I have more than an passing interest in the story. I was one of the co-founders of PLoS, am a member of its Board of Directors, and continue to play an active role in its activities. I am also worked closely with the senior author on the original paper – Joe DeRisi – for three years while we were in Pat Brown’s lab at Stanford, and he remains a good friend. He is not only one of the most creative people I know, he is one of the best, and most careful, experimentalists I have ever met.

Putting aside the question of retraction for a moment, this is exactly how science is supposed to work. Several very good scientists found an intriguing and potentially important result and published a paper on it. Subsequent efforts failed to confirm their initial result. Rather than digging in their heals and defending their initial study – as many scientists do – the original authors accepted the newer results, and went to great lengths to figure out what had gone wrong. Their new paper is a model of detective work, and a cautionary tale about the challenges of working with clinical samples and viruses that everyone should read.

So it is now pretty clear that the major conclusion of the original paper – the association between XMRV and prostate cancer – is wrong. Obviously, people working in the field and anyone interested in the prostate cancer and chronic fatigue syndrome (the subject of a subsequent paper) who come upon the 2006 PLoS Biology paper need to know that subsequent studies have shown that the samples were contaminated and the conclusions are no longer accepted by authors. The question is how to do this.

Unfortunately, in the current world of scientific publishing, there aren’t a lot of ways to do this, and the editors at PLoS Pathogens chose to retract the paper. This retraction was accompanied by an editorial from PLoS Pathogens editor Kasturi Haldar and PLoS Medicine editor Ginny Barbour on the role of retractions in correcting the literature. I don’t agree with the decision to retract this paper, but it is worth understanding their logic:

There is much misunderstanding about retractions. Authors and editors have been notoriously unwilling to use them, for the perceived shame that they bring upon authors, editors, and journals. Journalists regularly note the fact that retractions are increasing and ask whether the scientific literature is thus becoming less reliable. Websites such as Retraction Watch list and dissect retractions – an extra exposure at what is already a difficult time for authors and editors. In addition there is much confusion about how to effect retractions practically. In an effort to bring some clarity to this issue in 2009 the Committee on Publication Ethics of which PLOS Pathogens is a member and one of us (VB) is currently Chair, issued guidelines on retractions, which explicitly state that retractions are appropriate when findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error.

In essence, they are trying to expand the definition of retraction away from its common usage as a way to indicate misconduct to include all cases in which the findings of a paper should now be judged unreliable. They go on to explain how they will wield this redefined tool in the future:

We firmly believe that acceleration also requires being open about correcting the literature as needed so that research can be built on a solid foundation. Hence as editors and as a publisher we encourage the publication of studies that replicate or refute work we have previously published. We work with authors (through communication with the corresponding author) to publish corrections if we find parts of articles to be inaccurate. If a paper’s major conclusions are shown to be wrong we will retract the paper. By doing so, and by being open about our motives, we hope to clarify once and for all that there is no shame in correcting the literature. Despite the best of efforts, errors occur and their timely and effective remedy should be considered the mark of responsible authors, editors and publishers.

No matter what Haldar and Barbour want, they can not erase the stigma of retraction by fiat. When a work means something in the community, it doesn’t matter what a dictionary or some unknown committee says. Retractions are viewed by scientists and the public as marks of shame. Imagine how the students and postdocs who carried out the work described in the 2006 paper. They did nothing wrong. Indeed several participated in the effort to figure out what went wrong – going above and beyond what most people would have done. And the reward for their effort is to have “RETRACTED” show up every time someone searches for them on PubMed? This is not the right solution.

I understand the instinct to want a way to correct the literature, especially in cases like this that have attracted a lot of public attention. But isn’t science ultimately all about correcting the literature? It’s not a singular act to look back at previous work and find things that could have been done better, and even things that are outright wrong. This is a large part of what we do. If you look back at the literature from five year, ten years or longer ago, you will find myriad papers that, given what we know now, have findings that are unreliable and conclusions that are now clearly wrong. Are we going to go back and retract all of these papers? Of course not. It’s insane.

As easy as it might be to dismiss this incident as an isolated example of editorial overreach, this is really just the latest manifestation 0f a broader problem that plagues scientific publication and poisons the scientific process: the reification of the citation. Going back and correcting published papers only makes sense if you view the scientific literature as an isolated collection of discrete, singular events – publications – commemorated with a sacred merk – the citation. If papers are supposed to stand forever as vessels of truth, then of course you have to purge those that are shown to be wrong – both to protect people from untruths, and to defend the sanctity of the citation.

Researchers dread retractions for the same reason they will sell their souls to publish in a high impact journals - because the currency of academic success is not achievement – it is citations. Sure, they are not unlinked. But where they come into conflict, citation almost always win. A Nature paper is a Nature paper forever – even if the results turn out to be insignificant, or, as is often the case, outright wrong. The only thing that can change that is a retraction.

Thus, in some ways, the proposal by Haldar and Barbour is not reactionary, as many have suggested – it is deeply subversive. By exposing all citations – not just those achieved dishonestly – to the threat of retraction it strips the citation of one of its most valuable properties – permanence.  But despite my love for all things subversive, I do not think this is the right solution, as it ultimately reinforces the idea of the scientific literature as a collection of discrete events.

An obvious solution to all of these problems follows from thinking about the literature as what it really is: a historical record of ideas, discoveries and, yes, mistakes – whose value comes not from static individual pieces, but from ways in which they are connected and change over time. It is often said that science is “self-correcting”, recognizing that our views of the value and validity of previously published work inevitably changes over time as we use, build on and expand upon the work of our colleagues - something perfectly demonstrated by the XMRV story. What we need to do is not to isolate and protect ourselves from the dynamic nature of science, but to embrace it.

It’s disheartening that in this day of electronic publications and databases that the editors felt that the only way they could ensure that people reading the 2006 XMRV paper would look at it in the context of newer findings was to retract the paper. If we had a way of capturing how new methods, data and ideas were changing our view of earlier work, they would not have needed to even consider something as dire or as clumsy as a retraction. And there is no reason we can’t do this – we have the technical means to switch from one-time assessments of a paper to a system of ongoing evaluation and reevaluation whose output changes as our understanding grows. The only thing stopping us is the continued reification of the citation in science, and our unwillingness to discard it.

UPDATE: I want to emphasize that my goal here was not to take the editors’ to task. I don’t completely support what they did, but they were trying to deal a real, immediate problem – people acting on conclusions from a paper whose results nobody now believes to be true. What I was primarily lamenting was the fact that our system does not provide them with any other tool than retraction.

Posted in publishing, science | Tagged | 13 Responses

Blinded by Big Science: The lesson I learned from ENCODE is that projects like ENCODE are not a good idea

When the draft sequence of the human genome was finished in 2001, the accomplishment was heralded as marking the dawn of the age of “big biology”. The high-throughput techniques and automation developed to sequence DNA on a massive scale would be wielded to generate not just genomes, but reference data sets in all areas of biomedicine.

The NHGRI moved quickly to expand the universe of sequenced genomes, and to catalog variation within the human population with HapMap, HapMap 2 and 1000 genomes. But they also began to dip their toe into the murkier waters of “functional genomics”, launching ENCODE, a grand effort to build an encyclopedia of functional elements in the human genome. The idea was to simultaneously annotate the human genome and provide basic and applied scientists working on human disease with reference data sets that they would otherwise have had to generate themselves. Instead of having to invest in expensive equipment and learn complex protocols, they would often be able to just download the results, thereby making everything  they did faster and better.

Now, a decade and several hundred million dollars later, the winding down of ENCODE and the publication of dozens of papers describing its results offer us a vital opportunity to take stock in what we learned, if it was worth it, and, most importantly, whether this kind of project makes sense moving forward. This is more than just an idle intellectual question. NHGRI is investing $130m in continuing the project, and NHGRI and the NIH as a whole, have signalled their intention to do more projects like ENCODE in the future.

I feel I have a useful perspective on these issues. I served as member of the National Advisory Committee for the ENCODE and related modENCODE projects throughout their lifespans. As a postdoc with Pat Brown and David Botstein in the late 90′s I was involved in the development of DNA microarrays and had seen first hand the transformative potential of genome sequences and the experimental genomic techniques they enabled. I believed then, and still believe now, that looking at biology on a big scale is often very helpful, and that it can make sense to let people who are good at doing big projects, and who can take advantage of economies of scale, generate data for the community.

But the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.

American biology research achieved greatness because we encouraged individual scientists to pursue the questions that intrigued them and the NIH, NSF and other agencies gave them the resources to do so. And ENCODE and projects like it are, ostensibly at least, meant to continue this tradition, empowering individual scientists by producing datasets of “higher quality and greater comprehensiveness than would otherwise emerge from the combined output of individual research projects”.

But I think it is now clear that big biology is not a boon for individual discovery-driven science. Ironically, and tragically, it is emerging as the greatest threat to its continued existence.

The most obvious conflict between little science and big science is money. In an era when grant funding is getting scarcer, it’s impossible not to view the $200m spent on ENCODE in terms of the ~125  R01′s it could have funded. It is impossible to score the value lost from these hundred or so unfunded small projects against the benefits of one big one. But a awful lot of amazing science comes out of R01′s, and it’s hard not to believe that at least one of these projects would have been transformative.

But, as bad as the loss of individual research grants is, I am far more concerned about the model of independent research upon which big science projects are based.

For a project like ENCODE to make sense, one has to assume that when a problem in my lab requires high-throughput data, that years in advance, someone – or really a committee of someones – who has no idea about my work predicted precisely the data that I would need and generated it for me. This made sense with genome sequences, which everyone already knew they needed to have. But for functional genomics this is nothing short of lunacy.

There are literally trillions of cells in the human body. Multiply that by life stage, genotype, environment and disease state, and the number of possible conditions to look at is effectively infinite. Is there any rational way to predict which ones are going to be essential for the community as a whole, let alone individual researchers? I can’t see how the answer is possibly yes. What’s more, many of the data generated by ENCODE were obsolete by the time they were collected. For example, if one were starting to map transcription factor binding sites today, you would almost certainly use some flavor of exonuclease ChIP, rather than the ChIP-seq techniques that dominate the ENCODE data.

I offer up an example from my own lab. We study Drosophila development. Several years ago a postdoc in my lab got interested in sex chromosome dosage compensation in the early fly embryo, and planned to use genome-wide mRNA abundance measurements in male and female embryos to study it. It just so happened that the modENCODE project was generating genome-wide mRNA abundance measurements in Drosophila embryos. Seems like a perfect match. But these data was all but useless to us, not because the data weren’t good – the experiment was beautifully executed – but because their data could not answer the question we were pursuing. We needed sex-specific expression; they pooled males and females. We needed extremely precise time resolution (to within a few minutes); they looked at two hour windows. There was no way they could have anticipated this – or any of the hundreds of other questions about developmental gene expression that came up in other labs.

We were fortunate. I have money from HHMI and was able to generate the data we needed. But a lot of people would not have been in my position, and in many ways would have been worse off because the existence of ENCODE/modENCODE makes it more difficult to get related genomics projects funded. At this point the evidence for such an effect is anecdotal – I have heard from many people that reviewers explicitly cited an ENCODE project as a reason not to fund their genomics proposal – but it’s naive to think that these big science projects will not affect the way that grants are allocated.

Think about it this way. If you’re an NIH agency looking to justify your massive investment in big science projects, you are inevitably going to look more favorably on proposals that use data that has already, or is about to be, generated by expensive projects that feature in the institute’s portfolio. And the result will be a concentration of research effort on datasets of high technical quality, but little intrinsic value, with scientists wanting to pursue their own questions left out in the cold, and the most interesting and important questions at risk of never being answered, or even asked.

You can already see this mentality at play in discussions of the value of ENCODE. As I and many others have discussed, the media campaign around the recent ENCODE publications was, at best, unseemly. The empty and often misleading press releases and quotes from scientists were clearly masking the fact that, despite publishing 30 papers, they actually had very little of grand import to say, today, about what they found. The most  pensive of  them realized this, and went out of their way to emphasize that other people were already using the data, and that the true test was how much the data would be used over the coming years.

But this is the wrong measure. These data will be used. It is inevitable. And I’m sure this usage will be cited often to justify other big science projects ad infinitum. And we will soon have a generation of scientists for whom an experiment is figuring out what kinds of things they can do with data selected three years earlier by a committee sitting in a windowless Rockville hotel room. I don’t think this is the model of science anyone wants – but it is precisely where we are headed if the metastasis of big science is not amended.

I want to be clear that I am not criticizing the people who have carried out these projects. The staff at the NIH who ran ENCODE, and the scientists who carried it out worked tirelessly to achieve its goals, and the organizational and technical feat they achieved is impressive. But that does not mean it is ultimately good for science.

When I have raised these concerns privately with my colleagues, the most common retort I get is that, in today’s political climate, Congress is more willing to fund big, ambitious sounding projects like ENCODE than they are to simply fund the NIH extramural budget. I can see how this might be true. Maybe the NIH leadership is simply feeding Congress what they want in order to preserve the NIH budget. And maybe this is why there’s been so little push back from the general research community against the expansion of big biology.

But it will be a disaster if, in the name of protecting the NIH budget and our labs’ funding, we pursue big projects that destroy investigator driven science as we know it in the process.

Posted in ENCODE, NOT junk, science, science and politics | Tagged | 41 Responses