This is from a kind of silly article in the NYT about how people are generating too much DNA sequence data and we can’t really deal with the deluge. They get lots of smart people (many of whom are my friends) to talk about this problem – but I think they’re making a mountain out of a molehill. Other fields (anything involving the capture of lots of images – like cell biology or astronomy) are swimming in far bigger seas of data, and don’t make a big deal out of it in the NYT. But, whatever, a bit of harmless whining never hurt anyone.
I just thought it was particularly funny that an article that I felt just didn’t get it would be capped by a photo that demonstrated that they didn’t get it – that is not a picture of “some cells”, but rather a picture of an Illumina sequencing flow cell, which is only, you know, the whole subject of the article.
UPDATE: My friend Lior Pachter expressed my sentiments about the article perfectly on FB:
A compressed genome can be stored in a few Mb of data, something like the size of a decent quality photo taken with a new cell phone. For perspective I think that we are currently taking about 1/3 trillion photos per year. The fact that the data is currently being stored in its redundant raw format is either gross stupidity or purposeful scamming by charlatans trying to make a quick buck selling disks, its hard to tell which. Frankly, unless the animal is endangered, shouldn’t we just go back and sequence it in 3 years for the price of a USB key? Why is all this crappy Illumina sequence filled with errors being stored in the first place? It is true there are significant and interesting and challenging computational problems related to high-throughput sequencing, but they have to do with coupling relevant and statistically sound analyses with interesting biological experiments.