Blinded by Big Science: The lesson I learned from ENCODE is that projects like ENCODE are not a good idea

When the draft sequence of the human genome was finished in 2001, the accomplishment was heralded as marking the dawn of the age of “big biology”. The high-throughput techniques and automation developed to sequence DNA on a massive scale would be wielded to generate not just genomes, but reference data sets in all areas of biomedicine.

The NHGRI moved quickly to expand the universe of sequenced genomes, and to catalog variation within the human population with HapMap, HapMap 2 and 1000 genomes. But they also began to dip their toe into the murkier waters of “functional genomics”, launching ENCODE, a grand effort to build an encyclopedia of functional elements in the human genome. The idea was to simultaneously annotate the human genome and provide basic and applied scientists working on human disease with reference data sets that they would otherwise have had to generate themselves. Instead of having to invest in expensive equipment and learn complex protocols, they would often be able to just download the results, thereby making everything  they did faster and better.

Now, a decade and several hundred million dollars later, the winding down of ENCODE and the publication of dozens of papers describing its results offer us a vital opportunity to take stock in what we learned, if it was worth it, and, most importantly, whether this kind of project makes sense moving forward. This is more than just an idle intellectual question. NHGRI is investing $130m in continuing the project, and NHGRI and the NIH as a whole, have signalled their intention to do more projects like ENCODE in the future.

I feel I have a useful perspective on these issues. I served as member of the National Advisory Committee for the ENCODE and related modENCODE projects throughout their lifespans. As a postdoc with Pat Brown and David Botstein in the late 90’s I was involved in the development of DNA microarrays and had seen first hand the transformative potential of genome sequences and the experimental genomic techniques they enabled. I believed then, and still believe now, that looking at biology on a big scale is often very helpful, and that it can make sense to let people who are good at doing big projects, and who can take advantage of economies of scale, generate data for the community.

But the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.

American biology research achieved greatness because we encouraged individual scientists to pursue the questions that intrigued them and the NIH, NSF and other agencies gave them the resources to do so. And ENCODE and projects like it are, ostensibly at least, meant to continue this tradition, empowering individual scientists by producing datasets of “higher quality and greater comprehensiveness than would otherwise emerge from the combined output of individual research projects”.

But I think it is now clear that big biology is not a boon for individual discovery-driven science. Ironically, and tragically, it is emerging as the greatest threat to its continued existence.

The most obvious conflict between little science and big science is money. In an era when grant funding is getting scarcer, it’s impossible not to view the $200m spent on ENCODE in terms of the ~125  R01’s it could have funded. It is impossible to score the value lost from these hundred or so unfunded small projects against the benefits of one big one. But a awful lot of amazing science comes out of R01’s, and it’s hard not to believe that at least one of these projects would have been transformative.

But, as bad as the loss of individual research grants is, I am far more concerned about the model of independent research upon which big science projects are based.

For a project like ENCODE to make sense, one has to assume that when a problem in my lab requires high-throughput data, that years in advance, someone – or really a committee of someones – who has no idea about my work predicted precisely the data that I would need and generated it for me. This made sense with genome sequences, which everyone already knew they needed to have. But for functional genomics this is nothing short of lunacy.

There are literally trillions of cells in the human body. Multiply that by life stage, genotype, environment and disease state, and the number of possible conditions to look at is effectively infinite. Is there any rational way to predict which ones are going to be essential for the community as a whole, let alone individual researchers? I can’t see how the answer is possibly yes. What’s more, many of the data generated by ENCODE were obsolete by the time they were collected. For example, if one were starting to map transcription factor binding sites today, you would almost certainly use some flavor of exonuclease ChIP, rather than the ChIP-seq techniques that dominate the ENCODE data.

I offer up an example from my own lab. We study Drosophila development. Several years ago a postdoc in my lab got interested in sex chromosome dosage compensation in the early fly embryo, and planned to use genome-wide mRNA abundance measurements in male and female embryos to study it. It just so happened that the modENCODE project was generating genome-wide mRNA abundance measurements in Drosophila embryos. Seems like a perfect match. But these data was all but useless to us, not because the data weren’t good – the experiment was beautifully executed – but because their data could not answer the question we were pursuing. We needed sex-specific expression; they pooled males and females. We needed extremely precise time resolution (to within a few minutes); they looked at two hour windows. There was no way they could have anticipated this – or any of the hundreds of other questions about developmental gene expression that came up in other labs.

We were fortunate. I have money from HHMI and was able to generate the data we needed. But a lot of people would not have been in my position, and in many ways would have been worse off because the existence of ENCODE/modENCODE makes it more difficult to get related genomics projects funded. At this point the evidence for such an effect is anecdotal – I have heard from many people that reviewers explicitly cited an ENCODE project as a reason not to fund their genomics proposal – but it’s naive to think that these big science projects will not affect the way that grants are allocated.

Think about it this way. If you’re an NIH agency looking to justify your massive investment in big science projects, you are inevitably going to look more favorably on proposals that use data that has already, or is about to be, generated by expensive projects that feature in the institute’s portfolio. And the result will be a concentration of research effort on datasets of high technical quality, but little intrinsic value, with scientists wanting to pursue their own questions left out in the cold, and the most interesting and important questions at risk of never being answered, or even asked.

You can already see this mentality at play in discussions of the value of ENCODE. As I and many others have discussed, the media campaign around the recent ENCODE publications was, at best, unseemly. The empty and often misleading press releases and quotes from scientists were clearly masking the fact that, despite publishing 30 papers, they actually had very little of grand import to say, today, about what they found. The most  pensive of  them realized this, and went out of their way to emphasize that other people were already using the data, and that the true test was how much the data would be used over the coming years.

But this is the wrong measure. These data will be used. It is inevitable. And I’m sure this usage will be cited often to justify other big science projects ad infinitum. And we will soon have a generation of scientists for whom an experiment is figuring out what kinds of things they can do with data selected three years earlier by a committee sitting in a windowless Rockville hotel room. I don’t think this is the model of science anyone wants – but it is precisely where we are headed if the metastasis of big science is not amended.

I want to be clear that I am not criticizing the people who have carried out these projects. The staff at the NIH who ran ENCODE, and the scientists who carried it out worked tirelessly to achieve its goals, and the organizational and technical feat they achieved is impressive. But that does not mean it is ultimately good for science.

When I have raised these concerns privately with my colleagues, the most common retort I get is that, in today’s political climate, Congress is more willing to fund big, ambitious sounding projects like ENCODE than they are to simply fund the NIH extramural budget. I can see how this might be true. Maybe the NIH leadership is simply feeding Congress what they want in order to preserve the NIH budget. And maybe this is why there’s been so little push back from the general research community against the expansion of big biology.

But it will be a disaster if, in the name of protecting the NIH budget and our labs’ funding, we pursue big projects that destroy investigator driven science as we know it in the process.

This entry was posted in ENCODE, NOT junk, science, science and politics and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. serg
    Posted September 10, 2012 at 2:38 am | Permalink

    You use $200m for ENCODE budget. Actually, if you add up numbers in NHGRI press release or copy it from Science (To date, NHGRI has put $288 million toward ENCODE…) it’s a bit higher. Perhaps, there was additional funding from outside US.

  2. Posted September 10, 2012 at 3:37 am | Permalink
  3. DrugMonkey
    Posted September 10, 2012 at 4:54 am | Permalink

    Agreed. Very well put.

  4. Nick Eriksson
    Posted September 10, 2012 at 5:11 am | Permalink

    1. Depending on your feelings about GWAS, you may find it ironic that the utility of ENCODE for interpreting GWAS was oversold. A little explanation for 10% of the loci is nice, but it hasn’t added much to the last handful of GWAS papers I’ve submitted.

    2. Do you really think that 1% of all R01 funded scientists do something transformative every 5 years, as you imply? The proper null for this question is somewhatdifferent from “R01 funded scientists at Berkeley”or “who read this blog”…

    3. The big advantage of projects like ENCODE is that they share the data… the “open science” benefit is worth a pretty big premium to me over 125 R01s who by and large couldn’t care about reusability… Think about replacing 1000 genomes with 50 R01s each sequencing 50 genomes. What’s the chance you could download any of those now?

  5. Pete Laberge
    Posted September 10, 2012 at 5:22 am | Permalink

    Good points….
    Well, I think it is a matter of what, exactly, the project is.
    Some projects /studies /experiments /etc. … CAN be better done with small science, and some others with big science. And… Sometimes you even need some “medium science”.
    Consider the Manhattan Project. Consider ENIAC. Consider NASA and the various space programs. Consider the building of the transcontinental railroads. These all needed “Big Science / Big Project” work and thinking.
    But now, well, computing tech has changed, we don’t need many ENIACS! (Indeed companies like Apple and Microsoft may be too big!)
    Consider NASA. NASA is still needed. But there is also a lot of probably useful, and definitely innovative work being done on a smaller scale.
    The trouble is matching the funding, resources, right people, and right goals and desires to the project at hand.
    Sometimes “Big Science” provides data, facts, knowledge, experience, and so forth that later on, “Small Science” can use and run with.
    And sometimes a lot of “Small Science”, finally gets all pulled together into a “Big Project.”
    But if on any one project you do not have the right blend… you will have waste, or maybe even failure….
    But how do we figure out what this “blend” should be? How do we figure out what scale to use? That’s the conundrum!
    Have you any ideas on how this could be done?

  6. Titus Brown
    Posted September 10, 2012 at 5:42 am | Permalink

    Agree almost entirely. My only caution is that many people had the same complaints about the human genome project, which has been a fantastic success in the long term, both because of the data and because of the technology development. What I hate most about encode, myself, is the idea that the data generation and analysis components are effectively closed to outsiders… Think about the cool stuff that could have happened with that data if it were made maximally open from the start!

  7. Antonia Monteiro
    Posted September 10, 2012 at 5:59 am | Permalink

    I completely agree. In addition, I think everyone would agree that a lot of money is spent managing large grants where multiple labs are involved. Coordination, alone, costs time and money. Individual lab-run grants usually produce much more bang for the buck.
    Antonia Monteiro

  8. Posted September 10, 2012 at 6:57 am | Permalink

    Very good points, Mike. I hope this train (ENCODE) can be stopped. We did a video podcast last week and made some of the same points:
    Simply Statistics podcast on ENCODE

  9. Posted September 10, 2012 at 7:16 am | Permalink

    Maybe I’m a bit naive here … but from a data analysis point of view, having 10 of 1000s of datasets at a common location, uniformly annotated, generated through a consensus set of protocols, uniformly processed (in not necessarily the best possible way but in atleast a reasonable uniform way) and handed over to the community is a tremendous hypothesis generating resource. Take the ton of RO1s funded over the years. Many of them did phenomenal science. But how easy is to for a third party to find related experiments, collect them in one place, annotate them, understand the nuances of differences in the way they were generated etc. and then use them in new and interesting ways to understand biology?

    If one of the main argument against projects like ENCODE is that grant reviewers keep citing it unreasonably to deny funding to small labs, then maybe we need to have very serious discussion on how grant reviews should be done and not blame ENCODE for it?

    @TitusBrown: The UCSC ENCODE DCC has been releasing data freely and openly atleast twice a year since about 2010. This was in now way “hidden” from the community. So if you wanted to analyze the data, you could have from very early stages. Yes there was a 9 month embargo (which personally I wasnt a huge fan of) but that would not prevent you from analyzing the data. Also, the data is now all available. Whats stopping everyone from doing the same cool analyses now? ENCODE could have done a lot of things to release data faster and more effectively but that is all hindsight. Things will definitely be faster and better in the next phase. Internally we were constantly critical of inefficiencies and we certainly improved on it as the project progressed. The whole consortium (especially the leadership) put in a lot of effort into trying to make this a highly accessible resource. Its not perfect but we’ve had innumerable cases of people outside the consortium making great use of the data and many will continue to do so.

  10. Paul Orwin
    Posted September 10, 2012 at 8:24 am | Permalink

    This reminds me of Ferric Fang’s editorial in Infection and Immunity about “hypothesis driven research”. The cult of Popper is alive and well in medical research. ENCODE and lots of other genomics/trancriptomics falls under the rubric of “discovery science”. It is hard to do, and hard to interpret, and even harder to get funded if you are a small fry (I might know something about this last one). If it is treated appropriately (i.e. as a great way to generate lots of new testable hypotheses) it can be tremendously valuable, ie like the human genome project. But the price is high, and I worry about the crowding out effect too, especially in this era of budget constraint. You also allude to an important and relatively undiscussed effect, that of creating a very bright lamppost for all of us drunks to look for our keys under. This may keep lots of other interesting thinks poorly illuminated.
    The effect extends when you consider that not only is it expensive to redo these kind of experiments in subtly different systems, but you will get knocked with “why do these expts in a non-model tissue/organism/whatever when they’ve already been done?” which will possible keep us from learning about interesting global effects on gene regulation in other systems.
    I don’t really think this is an indictment of “big science” but it should be thought about, especially by the big shots at NIH/NSF/DOE/USDA when deciding to embark on these things.

  11. Jim Woodgett
    Posted September 10, 2012 at 9:37 am | Permalink

    The truth lies somewhere in-between in that there is a role for big projects if they can offer clear coordination, avoid the duplication of many smaller efforts and/or provide platforms from which many others can benefit in a real (not imaginary) sense. Administrators like to be able to point to big projects (like big infrastructure projects). They are big and expensive so must be good (too big to fail?). This logical fallacy is employed over and over again by many large organizations such as the armed forces, the secret service, etc. In some cases, it absolutely makes sense to pool resources and enumerate a grand plan. High energy physics is a great example. But most often in biology, the optimistic projections of the protagonists are rarely realized. Biology isn’t like engineering and our level of ignorance (compared to well understood architectural and materials principles) is often revealed when we embark on large-scale, bridge-too-far projects. But we keep quiet and plod on, knowing that criticising any form of science is fraught with the possibility that our governments might sense discontent and simply shift resources to something more “tangible”. So we need scientists to honestly evaluate where funds are going and to constantly weigh the balance between projects, large and small.

    As an aside, I do worry about the career plans of the trainees who are part of the machines. How may their individual contributions be evaluated when they are constrained by the objectives of the organizing committee, tasked with 2000 identical experiments? They may be technically skilled and highly efficient, but how is their intellectual curiosity being challenged? How much time and leniency are they given to explore? The best labs must surely try to balance this, but there must also be data generating sweatshops.

  12. Posted September 10, 2012 at 10:58 am | Permalink

    Aside from the very real institutional problems with big science that you have outlined, there is the problem with monopolies. At least the other big science (particle physics) has found a dynamic equilibrium where the two large consortia are in competition.

    One thing you didn’t mention, perhaps because you were (rightly) concentrating on consequences rather than causes, is the inherent preference of an institution like NIH for big projects. A big project has a much lower cost of review per funded grant dollar than a smaller project, is much easier to manage from NIH’s point of view, and is almost guaranteed to end in a declaration of success (as we have seen with ENCODE).

    We need to counteract these tendencies & I’m glad that you’ve spoken up about this. Big science was great to get us the human genome. Let’s go back to proper science, now.

  13. Posted September 10, 2012 at 11:32 am | Permalink

    It’s actually worse. People with big projects know that once funded they become too big to kill or too old to kill. When a data base and web site is put up, that almost gurantees long-term funding to maintain and update it. Sounds reasonable except that it co-opts funds for other things indefinitely.

    Some things of this sort (e.g., GenBank, UCSC browser) are great, but the profession now proliferates them, knowing the long-term benefits very well.

    One can identify many such perpetual projects, going back to the Radiation Effects Research Foundation in Japan in Hiroshima, Framingham, and others like them that were fine at one time but have long since reached diminishing returns.

    This of course undermines your post’s main points about spreading opportunity around as well as the problem with setting the framework with which others have to deal (using existing resources, even if they’re not wholly appropriate).

  14. Georgi Marinov
    Posted September 10, 2012 at 12:32 pm | Permalink

    Nick Eriksson
    Posted September 10, 2012 at 5:11 am | Permalink
    3. The big advantage of projects like ENCODE is that they share the data… the “open science” benefit is worth a pretty big premium to me over 125 R01s who by and large couldn’t care about reusability… Think about replacing 1000 genomes with 50 R01s each sequencing 50 genomes. What’s the chance you could download any of those now?

    As Anshul also said above – the value of standartization is overlooked here. While the argument in the original post that the fly modENCODE data was useless for what the Eisen lab wanted to do may have some truth to it, ENCODE has been doing human cell lines with which many people work with (and yes, there is a lot of cell line/assay space to be explored but that’s an argument for more of it, not for less) – that data IS directly usable, and it is in general of much higher quality than what is published by individual labs – between a quarter and a half of the ChIP-seq datasets that are in published papers would have never been used by ENCODE according to the QC criteria it now applies and a good portion of those papers should never have been published because what is studied there is the artifacts that are left when your ChIP fails, not the actual biology of the factor.

  15. Posted September 10, 2012 at 1:18 pm | Permalink

    We could call “big science” a manufacturing model of science.

  16. Posted September 10, 2012 at 6:08 pm | Permalink

    When I have raised these concerns privately with my colleagues, the most common retort I get is that, in today’s political climate, Congress is more willing to fund big, ambitious sounding projects like ENCODE than they are to simply fund the NIH extramural budget.

    Your colleagues are fucken delusional. Congress views the NIH budget as a way to send federal money to universities in their home states/districts. The more evenly this money is spread–and not just to megaprojects at a few sites–the easier it is to obtain appropriations. Remember the superconducting megacollider that got shut down before it ever megacollided jacke fucken dicke? That thing was doomed, because the entire motherfucker was in one state. If it had been possible to distribute that project across many states (of course, it couldn’t have been), it may have survived.

  17. Posted September 11, 2012 at 11:35 am | Permalink

    To see if your opinion is supported by the data one would have to do a little bit of data mining but that seems to be doable. NIH has a database of R01 grants and publications that come out of them. Using this database and Scopus or WoK one should be able to estimate how many citations a pool of 125 randomly selected R01s attract. This could be contrasted with the citations that the ENCODE papers got.

    • Posted September 11, 2012 at 11:50 am | Permalink

      Citations are not the right measure. I suggest Nobel Prizes….

  18. Posted September 11, 2012 at 2:31 pm | Permalink

    The same thing happens in physics. There is a huge quantity of brain power and money absorbed by Big Science projects. Sure, discovering the Higgs boson is some pretty serious achievement, I appreciate it, and all colleagues did a tremendous job on it, no question. But I think as a society, it is legitimate to ask the same question: if you account for the salaries, materials, etc… of all physicists who worked on this, in times where money is short, maybe we would have been better off to fund myriads of smaller physics projects, one of those could lead to another scientific revolution. It is certainly not specific to physics, but many exciting fields (e.g. physics of complex systems generally speaking) can not fully develop or be funded because of historical traditions, draining money, training students, and shaping curricula. For instance, I personally find surprising that the standard physics curriculum includes so much Quantum Mechanics but virtually no Chaos theory, and I see this as a historical anomaly due to this kind of big research funded over decades (do you really want your prediction to come true only after 50 years, like Higgs’ ?)

  19. Lee Henderson
    Posted September 14, 2012 at 7:05 am | Permalink

    I agree with your analysis with a caveat. Big science yields granular results, as noted by the Drosophila example, but advancing science and getting answers to fundamental questions requires answers that have fine detail that is not typical of big science project results. However, I would also argue that these big science projects can stimulate and motivate the individual scientist to develop those experimental plans based on the granular result that may provide some direction, as long as there is money left to support R01’s which is, I believe, your most important point.

  20. Anon
    Posted September 14, 2012 at 7:45 am | Permalink

    I might be alone in this but I worry about the costs to scientists souls as well. Watching the marketing (to the press, the public, and even other scientists) campaign surrounding the ENCODE results, I get the impression that the questions you ask here weigh heavily on many ENCODE participants. Or if not the questions, then at least the avoiding of the questions. No one wants to be part of a $200 million disappointment.

    When Ewan Birney says “We use the bigger number because it brings home the impact of this work to a much wider audience.”, I feel like he might as well paraphrase Churchill and declare “In Big Science, truth is so precious that she should always be attended by a bodyguard of lies.” One would hope a scientists would use the number that gave the most accurate impression and let the impact be whatever it would be. Perhaps the much wider audience doesn’t find “20% of the human genome is functional” to be all that impressive. If so, such is life.

    Perhaps when you are riding a $200 million juggernaut, you can’t afford such ideals but that’s no way for a scientist to live.

  21. anon
    Posted September 15, 2012 at 5:04 pm | Permalink

    I am glad someone is finally talking about this! I think it can be beneficial to have big projects with multi-lab collaborations at times. But the focus of these projects should be answering “big questions” versus generating “big datasets”

  22. Posted September 16, 2012 at 8:09 pm | Permalink

    Very interesting! I had the pleasure of interviewing a scientist participating in the ENCODE project while serving as editor of BioInform. His intelligence and dedication amazed me.
    Years later, no longer covering bioinformatics, I strolled through Jackson Square in New Orleans. Some busybody from North Carolina was all in my business. When I mentioned that I knew such and such because I’d interviewed several scientists, this fellow spat, “Ah, I don’t trust scientists. They just lie to get grant money.”
    Of course, I politely sped away. His comments stuck with me, though, as I wrestled with the question: ‘just how can an honest scientist get his life’s work done in view of the economics? and which ones should I be more leery of?’

    Well, blogs such as yours are very useful in helping raise everyone’s awareness. Let’s hope more money (not less) is channelled toward all scientific pursuits and not just the sexy big biology research studies.

  23. Posted September 19, 2012 at 11:48 am | Permalink

    Any thoughts on how the hypothesis of 80% functional genome fits with the occurrence of whole genome duplications, such as two rounds (2R) of whole genome duplication (WGD) that occurred at the base of vertebrates? The signature of 2R-WGD can be clearly seen in the human genome at the protein-coding gene level!

  24. Claudiu Bandea
    Posted September 20, 2012 at 5:19 am | Permalink

    Five reasons why my theory on the function of ‘junk DNA’ is better than theirs

    I intend to submit the paper below for publication in a peer-reviewed journal. Before submitting it, and have it reviewed by a handful (if that) of peers, I decided to post it here on the Blogosphere Preprint Server.

    The ENCODE project has produced high quality and valuable data. There is no question about that. Also, the micro-interpretation of data was fair. The problem was with the macro-interpretation of the results, which some consider to be the most important part of the scientific process. Apparently, the leaders of the ENCODE project agreed with this criterion, as they came out with one of the most startling biological paradigm since, well, since the Human Genome Project has shown that the DNA sequences coding for proteins and functional RNA, including those having well defined regulatory functions (e.g. promoters, enhancers), comprise less than 2% of the human genome.

    According to ENCODE’s ‘big science’ conclusion, at least 80% of the human genome is functional. This includes much of the DNA that has been previously classified as ‘junk DNA’ (jDNA). As metaphorically presented, in both scientific and lay media, ENCODE’s results means the death of the jDNA.

    However the eulogy of jDNA (all of it) was written more than two decades ago, when I proposed (and conceptually proven) that ‘jDNA’ functions as a sink for the integration of proviruses, transposons and other inserting elements, thereby protecting functional DNA (fDNA) from inactivation or alteration of its expression (see a copy of my paper posted here:; also, see a recent comment in Science, that I posted at Sandwalk: ).

    So, how does ENCODE theory stack ‘mano-a-mano’ with my theory? Here are five reasons why mine is better:

    #5. In order to label 80% of the human genome functional, ENCODE changed the definition of ‘functional’; apparently, 80% of the human genome is ‘biochemically’ functional, which from a biological perspective might be meaningless. My model on the function of jDNA is founded on the fact that DNA can serve not only as an information molecule, a function that is based on its sequence, but also as a ‘structural’ molecule, a function that is not (necessarily) based on its sequence, but on its bare or bulk presence in the genome.

    #4. Surprisingly, ENCODE theory is not explicitly immersed in one of the fundamental tenets of modern biology: Nothing in biology makes sense except in the light of evolution. Indeed, there is no talk about how jDNA (which contain approximately 50% transposon and viral sequences) originated and survived evolutionarily. On the contrary, my model is totally embedded and built on evolutionary principles.

    #3. One of the major objectives of the ENCODE project was to help connect the human genome with health and diseases. Labeling 80% of these sequences ‘biochemically functional’ might create the aura that these sequences contain genetic elements that have not yet been mapped out by the myriad of disease-associated genome wide studies; well, that remains to be seen. In the context of my model, the protective function of jDNA, particularly in somatic cells, is vital for preventing neoplastic transformations, or cancer; therefore, a better understanding of this function might have significant biomedical applications. Interestingly, this major tenet of my model can be experimentally addressed: e.g. transgenic mice carrying DNA sequences homologous to infectious retro-viruses, such as murine leukemia viruses (MuLV), might be more resistant to cancer induced by experimental MuLV infections as compared to controls.

    #2. The ENCODE theory is a culmination of a 250 million US dollars project. Mine, zilch; well, that’s not true, my model is based on decades of remarkable scientific work by thousands and thousands of scientists who paved the road for it.

    #1. The ENCODE theory has not passed yet the famous Onion Test (, which asks: why do onions have a genome much larger than us, the humans? Do we live in an undercover onion world? The Onion Test is so formidable and inconvenient that, to my knowledge, it has yet to make it through the peer review into the conventional scientific literature or textbooks. So, does my model pass the Onion Test? I think it does, but for a while, I’m going to let you try to figure it out how! And, maybe, when I’m going to submit my paper for publication, I’ll use your ideas, if the reviewers will ever ask me for an answer. Isn’t that smart?

  25. Alon Goren
    Posted September 25, 2012 at 4:03 pm | Permalink

    First, I should mention that I am very proud to be among the 442 people involved in the ENCODE project. I think several have already responded in accordance with some of my thoughts (e.g. @AnshulKundaje; @GeorgiMarinov; @Nick Eriksson).

    Nonetheless, I wanted to put forward these two key points:

    A. Let’s consider the human genome project (HPG) as a reference – the yield from the HGP (from here
    A new report by research firm Battelle Technology Partnership Practice estimates that between 1988 and 2010, federal investment in genomic research generated an economic impact of $796 billion, which is impressive considering that Human Genome Project (HGP) spending between 1990-2003 amounted to $3.8 billion. This figure equates to a return on investment (ROI) of 141:1 (that is, every $1 invested by the U.S. government generated $141 in economic activity).
    Here is a link for the entire report

    The point of the matter is that not only do big projects – e.g. HGP and ENCODE – appear to be a major way for transforming biology, they have also a tremendous economic impact.

    B. While I do see your point that the data generated by big consortia is not always the best fit for individual experiments, there is a key advantage to these efforts that seem to be overlooked. The reason your lab COULD have generated the datasets for the study you mentioned was BECAUSE the cost of such genomic methods is reduced via the extensive usage by e.g. ENCODE and 1000GP. I doubt that even with HHMI funding such experiments as your lab carried out here could have been generated, as these would have been too costly. Hence, on top of generating valuable datasets (and I assume you would agree the ENCODE and 1000GP data is valuable), these efforts enable individual labs to carry out experiments in an unprecedented magnitude.

  26. Posted October 5, 2012 at 8:42 am | Permalink

    Although I certainly understand the goals of the larger projects, your article certainly provides some food for thought. There are a number of talented researchers working in smaller laboratories that receive little or no funding. Is it possible to limit some of the funding going to places like ENCODE to make more funding available to smaller labs? One thing I generally see is that smaller labs find funding more difficult when they don’t already have any. It seems difficult for younger researchers to get started.

  27. Claudiu Bandea
    Posted October 15, 2012 at 8:42 am | Permalink

    In my parodic comment above, ”Five reasons why my theory on the function of ‘junk DNA’ is better than theirs”, I brought forward an old model (1) on the genome evolution and on the origin and function of the genomic sequences labeled ‘junk DNA’ (jDNA), which in some species represents up to 99% of the genome.

    Since then, I posted in Science five mini-essays outlining some of the key tenets associated with this model, which might solve the C-value and jDNA enigmas (

    As discussed in the original paper (1) and these mini-essays, the so called jDNA serves as a defense mechanism against insertional mutagenesis, which in humans and many other multicellular species can lead to cancer.

    Expectedly, as an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the insertional mutagenesis activity and the evolutionary constrains on genome size.

    1. Bandea CI. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. 1990.

  28. Mark S
    Posted November 10, 2012 at 11:50 am | Permalink

    Great blog. I agree. Having been at the NIH, it appears that even their internal projects are being steered to BIG science versus hypothesis-driven science. Unfortunately for young scientists training there, it makes it more difficult to publish as it takes many years to test and explain those data…. or to realize it was a huge bust. It seems true that “the bigger they are, the harder they fall.” By the way, when my postdoc position (an attempt at BIG science) ended last year, I ended up volunteering there for a few months until landing a job at a place where I am no longer at the bench. I try to maintain by optimism though.

  29. Posted May 6, 2013 at 12:00 pm | Permalink

    I am really glad someone at last talking about this. I believe is good to have big projects with multi-lab collaborations, in my opinion thats very importan.

12 Trackbacks