(Galaxy Map image courtesy of the Sloan Digital Sky Survey Image Gallery)
LC: Can you tell me a little bit about the SDSS project?
ZI: Right, so the SDSS was the first time that you could basically observe the whole sky in a uniform way and get one data set for many different things. Traditionally, you would apply for telescope time on a big telescope, so you can get one night, two nights, maybe three nights if you’re lucky. Then you get your student and you go to the mountain and spend all night observing. In the old days, you would have tapes that you would take home with data. Now you transfer it by antenna, but still if you’re in that mode of operation you get a small amount of data that essentially can fit usually on a USB stick.
So the hard thing is to actually analyze data and figure out what’s in it, and it’s usually a very detailed investigation of something. And so with those two or three nights, usually you stretch it to a year of analysis. Then you formally have to make it public, but there is a very low percentage of recycled data for new analysis. With SDSS we got basically a quarter of the sky analyzed and processed. So instead of going spending three nights at the mountain, you could just go to this database and ask what are the brightnesses or positions or some other properties of some stars or galaxies. Or you could ask it for data across the whole sky and get 500 million objects and just ask some statistical question like what is the most common type of star or galaxy.
Then, not only that you wouldn’t have to go to the mountain, but you could have ten thousand other people access the same database and look at the same data but from slightly different perspectives and get completely different papers. They’d have pretty much the same numbers but different ideas in their head, so it’s really limited by ideas and by what you know about astrophysics, math, and so on–what you can extract from the available data.
And these days, with this new project LSST, we have not just 500 million but 40 billion objects so it will be like 80 times bigger, a dataset about two orders of magnitude bigger. And not only that we will have two orders of magnitude more objects, but instead of observing each object like once or twice like SDSS, it will be many thousands of times.