Another paper ready for open review: comparative ChIP-seq and RNA-seq in Drosophila embryos

As I wrote about for our last paper, I hate the way scientific publishing works today, especially the insane delays (average is about 9 months) between when a lab is ready to share its work and when the work is actually available. So, from now on we are going to post all of our papers online when we feel they’re ready to share – before they go to a journal. We’ll then solicit comments from our colleagues and use them to improve the work prior to formal publication.Physicists and mathematicians have been doing this for decades, as have increasing number of biologists. It’s time for this to become standard practice.

Ground rules: I will not filter comments except to remove obvious spam. You are welcome to post comments under your name or under a pseudonym – I will not reveal anyone’s identity – but I urge you to use your real name as I think we should have fully open peer review in science. The original paper and comments will remain available here as a record of the review process.

Paris M et al. (2013). Gene expression in early Drosophila embryos is highly conserved despite extensive divergence of transcription factor binding. Full manuscript. Text only. Figures only.

The paper is now available at arXiv. Please use the arXiv version for formal citations.

This paper is the result of several years of work from Mathilde Paris, a very talented postdoctoral fellow in my lab. Mathilde was interested in looking at the evolution of transcription factor binding in highly diverged Drosophila species and the effect of changes in transcription factor binding on gene expression. So she carried out a series of chromatin immunoprecipitation experiments using antibodies raised against four D. melanogaster proteins involved in early anterior-posterior (head -> tail) patterning. She carried out ChIP-seq experiments in D. melanogaster as well as D. pseudoobscura (diverged ~30mya) and D. virilis (diverged ~40mya). There were a lot of technical challenges in getting these experiments to work to our satisfaction (described in the methods section of the paper), but eventually Mathilde had a dataset in which we had sufficient confidence to analyze in detail.

The most striking observation about the ChIP data is just how different the binding patterns of these factors are in these different species, which, for all intents and purposes, undergo identical early developmental processes. We can identify two clear factors driving this divergence: the gain and loss of binding sites for these two factors (for background on binding site turnover see this 2008 paper from our lab), and the gain and loss of binding sites for the early embryonic master regulator Zelda (see this 2011 paper from our lab for more information about Zelda). However, these two effects did not completely explain the observed divergence, which may also be influenced by environmental factors (the species do not all develop at the same temperatures or same rates) and developmental, biochemical and experimental noise.

In contrast to the divergence of transcription factor binding, gene expression in stage-matched embryos is highly conserved. And one of the central issues discussed in the paper is why there is this discordance between transcription factor binding and gene expression divergence.

As always, we await your comments, and will respond as quickly as we can.

This entry was posted in EisenLab, open access. Bookmark the permalink. Both comments and trackbacks are currently closed.

9 Comments

  1. Posted March 1, 2013 at 6:31 pm | Permalink

    This is quite a fascinating finding, and reminds me of the robustness of neuronal network input-output transfer functions in the face of dramatic variations in the intrinsic ionic conductances and weights of synaptic connectivity within the network. This is a principle that has best been worked out by Eve Marder’s lab in the context of the crustacean stomatogastric ganglion central pattern generator. And what her lab has found is that there are homestatic compensatory mechanisms that maintain consistent network output properties even when network element properties and connection properties are highly variable.

    If you had more species to look at, you could possible correlate differences in one particular transcription factor’s binding pattern with those of other transcription factors across the species to see if this kind of compensatory homeostasis is what underlies the consistency of ultimate transcriptional output.

  2. Posted March 1, 2013 at 6:34 pm | Permalink

    I think that creating the ABs against the orthologs from D. melanogaster and going on to test against those in the other species is an issue. There is no guarantee that they recognize these with the same affinity. In the methods section argument is presented to the effect that they should work ‘excellent cross-reactivity’. I might have missed the data backing this statement in my diagonal read but it is necessary to back this statement with data as it is one of the most fundamental aspects of the methodology used to obtain the data.

    • Posted March 1, 2013 at 7:32 pm | Permalink

      This was a major concern of ours going in to the experiment.

      For a typical D. melanogaster ChIP experiment for transcription factor X, we raise antisera against the D. melanogaster version of protein X and then affinity purify the antisera against recombinant protein X in order to minimize cross-reactivity. To control for using antisera raised against a D. melanogaster protein in D. pseudoobscura and D. virilis, we affinity purified the D. melanogaster antisera against recombinant versions of the D. virilis protein. Since antibodies will only have been generated if they recognize a D. melanogaster epitope, and would only be selected if they recognize a D. virilis one, we should end up with serum that only recognizes conserved epitopes. I realize now that we didn’t explain this very well in the paper.

      As another control, we also did several ChIPs in which the input chromatin was pooled from the three species prior to IP. The idea here was to avoid any variation owing to experimental handling.

      • Benoit Bruneau
        Posted March 2, 2013 at 7:27 pm | Permalink

        I’m not that concerned about this. Seems like it was well taken care of.

  3. Posted March 2, 2013 at 11:02 am | Permalink

    Very nice initiative! I wish more people will upload their papers before publication.

    This is an interesting study. It illustrates that we miss a piece of the puzzle to understand how TF find their binding sites, and how genes are regulated. Below are a couple of questions, suggestions and observations.

    I am not an expert, but the binding pattern of AP factors homogeneous across the embryo? Otherwise it is difficult to compare ChIP profiles between species because they would consist of different mosaiques. Is any data available about this?

    Given the importance you give to Zelda, how come you do not ChIP it? Bad luck with the antibodies? To plug in my own work (reference 49), the G+C content correlates with chromatin ‘accessibility’, which means that it also correlates with the first PC of almost every group of TF binding. I think you should see how much it correlates with the first PC in your case because the binding site for Zelda is slightly G+C rich.

    You seem to presuppose (understandably given the current knowledge) that AP factors will be the major determinant of the abundance of transcripts. Yet, there might be post-transcriptional buffering mechanisms – typically miRNAs – responsible for the conservation of the expression level. Perhaps analyzing the conservation of 3′UTR would reveal something interesting.

    Minor things:
    1. I did not find the number of unique reads you obtained in the ChIP-seq experiment, which allows the reader to form an opinion about their quality and robustness.
    2. Typo in the caption of figure 7D
    3. “We found a small, although significant, correlation between substitution rate and binding divergence (Figure 4A and S4).” The p-value in figure 4A is claimed to be 0.89.

    • Posted March 2, 2013 at 11:15 am | Permalink

      Thanks for the comments. Will get to details a bit later – we did ChIP Zelda in D. melanogaster http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002266 and have a very good antibody. The problem is that ZLD’s binding is most interesting at around mitotic cycle 8 and we need to hand sort embryos to get clean samples. This took months for D. melanogaster, and we’re not in a position to do it for every species. We also were able to predict ZLD binding very accurately in D. melanogaster just from its binding sequence, so we’re pretty confident that the presence of sites correlates well with its binding and activity.

  4. Mathilde Paris
    Posted March 2, 2013 at 2:49 pm | Permalink

    Salut Guillaume ! How’s Spain treating you?
    Thanks for the detailed comments. To add to Mike’s answer:

    - Expression patterns of the AP factors are very well conserved (http://www.ncbi.nlm.nih.gov/pubmed/22046143). So (minor) differences in expression patterns are unlikely to account for all the binding difference we see.

    - Regarding Zelda ChIPs: The protein is fairly well conserved and the antibody is excellent in D. melanogaster, although we have not tested it on the other species yet. I agree, ChIPping Zelda in the other species would be interesting, even at late stages. But doing this experiment would have required quite a bit of additional work (we would need to test the antibody and to collect more chromatin, which is particularly time-consuming for some species). Overall as Zelda was not the main focus of this paper and, as Mike said, sequence-based Zelda binding prediction is excellent, we felt the benefit over cost of doing this experiment was not favorable.

    - We have not looked at G+C content, I’ll check.

    - I completely agree that AP factors are only one type of gene expression regulators, and that other parameters should be taken into account, such as miRNAs. That was partly the point of the article conclusion. Maybe we should elaborate more on this.

    - Minor thing 1 : I can add the info.
    - Minor thing 2 : oops… thanks.
    - Minor thing 3 : p.values are significant for the other factors in figure S4 (but I agree that BCD – shown in the main figure – is not).

    • Posted March 4, 2013 at 4:16 pm | Permalink

      Hi Mathilde! I don’t know about Spain, but Catalonia is terrific!

      This all sounds good. To make a more general comments, I think that you are asking the right questions. With the current avalanche of data, we tend to forget that we still do not understand how TFs regulate transcription in vivo. One interpretation of your data is that there is quite some redundancy among TFs, or even that some binding events are ‘non functional’. Now that I think about it, perhaps you could cross your data with that of the modENCODE to see whether the sites with high AP factors binding turnover correspond to HOTs (Highly Occupied Targets). If zillions of other TFs are bound, we can imagine that the AP factors would be less indispensable for gene expression. I am not sure whether HOTs are available (or exist) in early embryo though.

  5. Posted March 4, 2013 at 3:41 am | Permalink

    This is a very interesting study. I am not an expert, please find below my comments. Hope it will be of some help to stimulate the discussion.

    - do you have any idea if the length of the spacers that separate adjacent binding sites are conserved between the species ? (if not, quenching of activators by nearby repressors may be very different)

    - BCD and KR overlapp in their binding specificities. Do the variations in regulatory sequences affect the binding of both ?

    - Are BCD, KR, GT and HB highly conserved (protein sequence) in the four species, especially in their DNA binding domains ?

    - It is interesting to see that you have mapped 2 ‘activators’ (BCD and HB) and 2 ‘repressors’ (GT and KR)

    - In terms of chromatin accessibility between the four species (genome-wide nucleosome mapping), is it conserved ?

    - you said that roughly similar numbers of peaks were found per ChIP (table s2). The data for pseudoobscura appear quiet different compared to the others, especially for KR. Any explanation ?

    Again that’s a great and stimulating paper. And Michael, posting your papers online before they go to a journal is a fantastic idea, or should I say, revolutionary.

One Trackback