Statistical BS from autism geneticist in New York Times

[UPDATE: There is a followup to this post here.]

Last week Nature published the results of three studies (1,2,3) looking at the sequences of protein-coding genes from hundreds of individuals with autism and their parents. The main results are that there is a higher rate of de novo mutations in affected individuals, that these primarily come from fathers, and that the affected genes are enriched for those involved in brain development and activity.

I think a bit too much is being made of these studies – they’re generally technically sound, but there remains no definitive link between any single mutation or groups of mutations and the disease. However, the authors of one of the papers have made a big deal about having found a mutation in the same gene in two unrelated individuals. This is described in a piece last week by Benedict Carey in the New York Times:

In one of the new studies, Dr. Matthew W. State, a professor of genetics and child psychiatry at Yale, led a team that looked for de novo mutations in 200 people who had been given an autism diagnosis, as well as in parents and siblings who showed no signs of the disorder. The team found that two unrelated children with autism in the study had de novo mutations in the same gene — and nothing similar in those without a diagnosis.

“That is like throwing a dart at a dart board with 21,000 spots and hitting the same one twice,” Dr. State said. “The chances that this gene is related to autism risk is something like 99.9999 percent.”

Wow. 99.9999 percent. That’s impressive. But I have no idea where it came from.

If the study had looked at exactly two families, and they had found a single de novo mutation in the affected individual in each family, and these had been in the same gene, then yes, it would have been like throwing a dart at a dart board with 21,000 spots (roughly the number of genes examined) and hitting the same one twice – or roughly 1 in 21,000. But this is not what they did.

The study actually examine 200 families with an affected and unaffected siblings, and identified 125 variants with the potential to alter protein function. So the question is not how likely it is to hit the same spot if you throw two darts, but rather how likely it is to hit the same spot if you throw 125 darts at a dart board with 21,000 spots. The answer is that you would expect to have two dots hit the same spot 30.9% of the time. That is roughly one in three times. In fact, the 30.9% number is a conservative estimate that assumes that the odds of hitting any given gene are the same – this is undoubtedly not the case, as some genes are bigger than others – so the real odds that that the authors would have found the same gene twice purely by chance are even greater. Either way, it’s a very far cry from 99.9999 percent odds against.

UPDATE: Now that I’ve had a chance to look at the paper in more detail, I realize the authors were making a more subtle point about the nature of the mutations involved – highlighting the fact that they found two non-sense or splice mutations in the same gene. The authors did some fairly sophisticated simulations of the chances of this occurring and found, if they restrict their analysis to genes expressed in the brain, that the chances of this occurring by chance are ~0.8%.

This is not the same as throwing darts at a dartboard with 21,000 genes as there are only 14,000 brain expressed genes. But I agree with the authors that this is not a trivially expected result. Though I still have no idea where the 99.9999% part of the quote came from. Four orders of magnitude is a big difference.

What’s annoying here is not so much that the NYT used this quote (though they really need someone around to check these kind of things), but rather that the quote came from the lead author of the paper – Matthew State – a clinical geneticist at Yale.

I can not believe State said this this way. I hope he was simply misquoted. But if he really said this, and assuming that then he understands the basic statistics involved (which, given his position, I find highly likely), then he must have oversimplified and somewhat misrepresented the significance of his findings in order to make it sound more impressive in the popular press.


For those interested in how I came up with the 30.9% number, even if it might not be relevant, the question we want to ask is how likely is it that if we picked a random gene 125 times from a set of 21,000 we would never pick the same gene twice. Think of it this way. The first gene we pick can not overlap another gene. When we pick the second gene, 20999 times out of 21000 (probability .99995238) it will not be the gene we picked first. When we pick the third gene, we assume the first two went into different boxes (otherwise we’d be done already) so the odds go down slightly, to 20998 times out of 21000 (.99990476) and they keep going down slightly each time until we get to gene 125 when the odds are 20876 out of 21000 (.99409524).

The counterintuitive part of this is that even though at each step the odds are low, in order to end up with all of the genes in different bins you have to be on the right side of that random probability at each of 124 different steps. And to calculate the odds of this, you have to multiple all of these numbers together: .99995238 * .99990476 * …. * .99409524 which equals .69088693. That means that there is only a 69.1 percent chance that all 125 randomly chosen genes will be different – or 1 30.9 percent chance that you’ll see at least one gene twice.

It’s the same logic as the classic probability question of how many people you have to have in a room for the odds to be greater than 50% that two of them share the same birthday – the answer being 22.

UPDATE: Several people here and on twitter complained that my analysis did not take into account the controls in the paper – and implied that the results would be very different if I did.

The controls have a completely negligible effect. The critique the commenters raised that the authors didn’t just observe two hits to the same gene in the autism cases, they observed no hits to that gene in the controls. The papers states that there were 87 relevant mutations in the controls. So, conditioned on the observation that some gene was hit twice in the cases, we want to know how likely it would be that you would not hit that gene in 87 controls. The answer is 99.6%.

So, whereas I stated originally that the probability of hitting the same gene twice by chance in 125 random samples from a pool of 21,000 genes was 30.9%, if we now ask what is the probability of hitting the same gene twice by chance in 125 random samples from a pool of 21,000 genes AND not hitting the same gene in a set of 87 controls, the answer is 30.8%.


This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Adrian Heilbut
    Posted April 8, 2012 at 1:45 pm | Permalink

    I agree that the dartboard analogy is very wrong, and 99.9999% is off by a couple of decimal places (in the paper they claim p=0.008 for SCN2A). It’s not clear either how he got 99.9999% from two independent dart throws at 21,000 genes, because that would even lower odds. I suspect it’s much more likely to be just a sloppy attempt to communicate the odds in an informal way than to deliberately overstate the significance.

    However, it probably should also be emphasized that isn’t just a case of the birthday problem, where you are counting how many times mutations show up in the same gene. Since they are comparing cases and controls, it’s the probability of seeing de novo mutations in a given gene among the autism cases, AND not seeing such mutations in that gene among the controls. They did pretty extensive simulations in the paper (fig 2 and S7) based on the observed rate of mutations of different types to justify the P-values and FDR they reported.

  2. Posted April 8, 2012 at 1:47 pm | Permalink

    My guess is that people who come from a classical genetics or positional cloning of disease genes background don’t really intuitively understand the statistics of genome-wide analyses, and how careful you have to be to appropriately account for experiment-wise error rates.

  3. J.J.E.
    Posted April 8, 2012 at 2:07 pm | Permalink

    @Comrade PhysioProf:

    I agree. But it is even worse than that. Correcting for error and bias in genome-wide studies can be very, very tricky, but people who publish in genomics should be able to at least attempt it (and reviewers ought to hold them to a reasonable attempt at that).

    But the error Mike is pointing out is more fundamental and has only to do with the probabilities of random samples, not error. This problem, while admittedly counterintuitive, is SUCH a well known blindspot of human intuition, that statistics and population genetics professors routinely trot it out starting in upper-level undergraduate courses (if not before) and continue mentioning it throughout graduate education in an apparently vain attempt to guide the probabilistic intuition of students.

    I have personally run across this in at least 3 courses and can’t count how many times this has come up in conversation over the years at coffee break or seminar socials.

    That being said, perhaps the author simply was grasping for an intuitive hook for the reporter and simply face-planted without realizing it. I certainly hope this was a spur-of-the-moment mistake and not a misconception that permeates his thinking about these 125 variants.

  4. Anonymous
    Posted April 8, 2012 at 2:15 pm | Permalink

    “However, it probably should also be emphasized that isn’t just a case of the birthday problem, where you are counting how many times mutations show up in the same gene. Since they are comparing cases and controls, it’s the probability of seeing de novo mutations in a given gene among the autism cases, AND not seeing such mutations in that gene among the controls.”

    This. Your 30.9% figure is just as idiotic.

  5. Posted April 8, 2012 at 2:25 pm | Permalink

    @Anonymous. I’ll do the math later, but given the much smaller number of mutations in the controls this should have at most a modest effect, and won’t come close to making the result significant.

  6. Posted April 8, 2012 at 2:30 pm | Permalink

    And note, the model in the paper includes all sorts of other factors – like whether the gene had been previously linked to ASD. But it’s nothing like what State was saying to the reporter.

  7. Anonymous
    Posted April 8, 2012 at 2:41 pm | Permalink

    Yes, yes. But the thing is, if you call something out as “patently idiotic,” you better make sure your own rebuttal is flawless. Which it isn’t.

    • Posted April 8, 2012 at 2:52 pm | Permalink

      Sorry Anonymous, but I was referring to what he said in the article. He made no mention of controls so neither did I. And it’s really a negligible correction.

  8. Posted April 8, 2012 at 2:57 pm | Permalink

    Whether Eisen’s math is correct has fuckealle to do with the egregiousness of State’s boner quote.

  9. Shara Yurkiewicz
    Posted April 8, 2012 at 3:10 pm | Permalink

    You’re debating the accuracy of the quote and not the paper, right? Because if the paper is statistically sound, then really this sounds more like a communication error when a scientist is trying to explain his work to journalists. A big one, I guess, but still. I don’t think it undermines the work itself, which I thought was pretty neat.

  10. J.J.E.
    Posted April 8, 2012 at 3:37 pm | Permalink

    @ Anonymous: Ugh. What weak tea. If you are going to call somebody names for lacking a flawless argument, you sure as hell better have a flawless basis for your own. You’re dangerously close to being hoisted by your own petard. In any event, from a media communications point of view, State’s verbal dart model was pretty clear. It is very reasonable and not at all tendentious to make the very small leap that the dartboard formulation is the same as the birthday problem formulation.

    Anyway, I did some simulations (because it took every bit of like 10 minutes to code) by sampling randomly (with replacement) from 21,000 loci two for two categories: one for “case” and one for “control”. For each category, I sampled 125 times. I then conditioned on the “case” sample containing at least two hits to at least one locus and asked if the “control” sample contained that locus/those loci at all. Out of 100,000 simulations, this happens only 0.724% of the time (724 total cases out of 100,000 simulations).

    Couple of caveats:
    1) I don’t know how many controls were sampled so I just sampled 125, just like for cases. Decreasing the number for controls will make the percentage even smaller;
    2) as Mike mentions, when there is variation in locus size (or mutation rate, etc), the probability of double hits in the case sample increases. However, the probability of overlap between case and control also increases. My intuition as to which of these conflicting forces wins in the end is poor. Although this would be easy to simulate. Maybe a different day;
    3) Caveat emptor: this was a simulation written in literally 10 minutes or so, and while I’ve checked for bugs and think (to the best of my knowledge) it is correct, there is a possibility of human error.

  11. Posted April 8, 2012 at 3:51 pm | Permalink

    State’s quote is strange, indeed.

    For what it’s worth, here are a few thoughts on what the paper does seem to find, which is described in the abstract as “a result that is highly unlikely by chance.”

    I think (from a very quick reading of the paper and supplementary materials) that what’s statistically significant is the finding of two independent variations in one (any one) brain-expressed gene, and on the surface, that matches your dartboard analogy.

    However, it doesn’t look as straightforward to model as throwing one dart per family at a 21,000 segment dartboard.

    The P-value of this result (Result: Hey, look! We found independent mutations from two different probands on the same gene, but that gene has nothing to do with ASD!) was 0.008, based on simulations. These simulations are like throwing virtual darts, but with the darts, dartboard, and conditions of the finding more specific to this experiment than your example. [In the supplemental materials Table S2, there’s a list of all the genes that were hit by two darts. The one gene that got special attention was hit twice by a rare kind of dart, as I read it.]

    I can’t tell at all how the model accounted for the likelihood that differences in genetics among siblings were particularly likely to be associated with differences in ASD status. I did watch part of a presentation State gave where he described problems in the past with case-control studies – that matching for broad ethnic categories wasn’t enough, and what you might “find” are genes associated with an ethnic difference you overlooked and didn’t match for, like Lithuanianicity. Presumably using siblings mitigates this to some extent, but the entire area of research is new and evolving, so time will tell.

    Like you, I don’t see anything in the paper to merit “something like 99.9999%” or close, but something happened that happens by chance in a fairly careful simulation only about 1% of the time. (There’s no statistical fishing expedition here, either, by the way, not that one could have passed muster at Nature. In other words, they didn’t do something silly like repeat the P-value calculation separately for hundreds of genes and report one small P-value.)

    So maybe there’s a “99%” chance this gene is related to autism risk, but but even then, it’s not clear how important this is. For one thing, a mutation in this gene was discovered in only 2 of the 225 study subjects with ASD. There were no mutations here among the other 223 ASD subjects.

    Analogy: Scientists find that there’s a 99.999% chance gunshots to the heart are associated with death. This doesn’t mean we’ve discovered the cause of death. We may have discovered A cause of death. Can we address the death problem by requiring everyone to wear a bullet-proof jacket? Probably not, though we might think about when it would be a good idea. (Sure, I could be overanalogizing, and even rare genetic causes of ASD are worth studying, because they might lead to an understanding of the disease.)

    Back to the P-value of 0.008. It’s based on simulations (summarized in Figure 2) that embody a number of assumptions. Some of the parameters in the model come from actual data (the overall rate of genetic mutations) and others don’t (the total number of genes that contribute ASD risk – and I’m not sure how “contribute” was defined). The authors varied the latter type of parameter over a wide range, presumably guided by good science about genetics and disease, and the finding (two independent mutations on one gene in an experiment with 225 families) remained an unusual result – only about 1% of the time occurring by chance when the gene in question is unrelated to ASD.

    By the way, State is the last (and corresponding) author, not the lead author. Even so, while he may not have participated significantly in the research and writing, he did something important – got a grant, headed up a lab, or something – and he’s been given the job of talking to the press, so he should have read the paper and understood the results better than his quote in the Times suggests. And as you note, the Times should have questioned this, even if they had read only the abstract, which mentions P=0.008.

  12. Posted April 8, 2012 at 4:40 pm | Permalink


    Your simulation is as follows, I think – correct me if I’m wrong. You throw 125 red darts and 125 blue darts at a 21,000-segment dartboard. You do this many times, and sometimes two red darts end up in the same segment. When this happens (which is about 1/3 of the time), there is only a 0.7% chance that a blue dart also ended up in that segment.

    I did the same simulation and got similar* results. (*I only ran enough trials to get 1 decimal place of accuracy and convince myself that you probably did it correctly.)

    This simulation has no particular relevance to the problem at hand, as far as I can tell. For there to be relevance to genetic markers for ASD, there would have to be at least the following assumptions:

    1. That there is an unknown labeling of dartboard segments as “ASD-associated” or “non-ASD associated.” [The experiment’s P-value will ultimately guide the researchers about their ability to confidently predict the label of a segment on which a preponderance of red darts lands.]

    2. One dart per discovered mutation, as opposed to one dart per study subject.

    3. An assumption that some fraction F of ASD cases are associated with a genetic marker.

    4. An assumption that some fraction G of mutations are discovered by the sequencing.

    5. An assumption about the distribution of the number of mutations (darts) per study subject. [For simplicity, suppose that each study subject has N mutations, G*N of which are discovered in testing.]

    6. Additional assumptions about whether ASD-associated mutations are any more or less likely to be discovered in genetic testing than non-ASD-associated mutations.

    Then, one can estimate the typical number of darts from each ASD subject, and the (presumably smaller) typical number from non-ASD subjects that hit ASD-related genes.

    Given the total number of darts and the fact that there is a difference (that can be estimated from the model) in likelihood they will strike ASD-related genes, depending on whether they are ASD-subjects’ darts or not, one can then calculate (by simulation) the likelihood that given two red darts and no blue darts having hit one gene, that that particular gene is ASD-associated.

    In other words, it’s a lot more complicated than dart-throwing, but I think the authors of the paper did this kind of thoughtful simulation, and for the assumptions that were more speculative, they ran the simulation for a range of assumptions. Unfortunately, the Times article, whether or not it quoted State correctly, gives little clue about the depth of science and statistics (whether the assumptions are sound or not, time will tell) behind genetic research.

  13. Posted April 8, 2012 at 5:22 pm | Permalink

    @JJE – the “control” issue is even more of a red herring because there were overall fewer mutations in the control sample than in the cases.

  14. Posted April 8, 2012 at 5:38 pm | Permalink

    @Anonymous. I updated the post to add the additional condition that the gene not be hit in the controls. This changes the result from 30.9% to 30.8%.

3 Trackbacks

  • […] Horrific statistical BS from autism geneticist in New York Times: What’s shocking here is not so much that the NYT let such an egregious mistake slide through (though they really need someone around to check these kind of things), but rather that the quote came from the lead author of the paper – Matthew State – a clinical geneticist at Yale! I honestly can not believe that he said something this patently idiotic. […]

  • […] it is NOT junk a blog about genomes, DNA, evolution, open science, baseball and other important things « Statistical BS from autism geneticist in New York Times […]

  • By Links 4/13/12 | Mike the Mad Biologist on April 13, 2012 at 1:31 pm

    […] Waste not, want not? Poultry “feather meal” as another source of antibiotics in feed Horrific statistical BS from autism geneticist in New York Times Reading without understanding: baboons can tell real English words from fake […]