SS: I was watching a PBS show and one of the people on the show was an MIT data scientist, and she had been diagnosed with breast cancer. And breast cancer, according to the show, if you catch it early there is a fairly good treatment success rate, if you catch it later it can be problematic. And she was wondering if you could apply data science to the mammogram data to see if you could catch it earlier. And the problem, I think, was that they may have a couple thousand cases, but you really need millions of cases. They can probably get that eventually, but in a lot of domains you cannot really apply the tools of data science because you don’t have the vast quantities of data.
And from my perspective, you look at how people learn–they’re parts of life where you do have millions of cases, so for example if I start talking and I… stop, you can probably anticipate the words I am about to say. Just like google can predict part of what you’re going to search for some of the time. And I think some of that is based on statistical assimilation of vast amounts of data. And your brain has sucked it up and compiled it, as it were. But there are lots of cases where you don’t need that much data. Imagine you get a new app on your phone. You probably can figure it out pretty easily. Who even reads the manual?
So the point is that people have many different ways of learning. Data science is one, but there are still lots of other techniques that aren’t known. I had a colleague who described artificial intelligence as “algorithm discovery.” He thought that people have all these abilities, and they have ways of doing things, algorithms, and it’s the job of the computer scientist just to discover how people do what they do.
LC: That feels really true in the computational vision class I’m taking. You know when you have to verify that you’re not a robot on google or something? They give you like a picture of a traffic light to identify, and presumably a computer can’t do it.
So the goalpost keeps moving.
Yeah, it’s a little more complicated than just identifying a traffic light, but it’s amazing how much we struggled in this class with even what you might think of as a simple problem like detecting edges. Humans are able to figure out things like this a lot quicker…
And other times it is an incomplete data problem. It seems like we still have these two ways to progress. One is to get better data–data collection is its own intense intellectual endeavor–and then also just developing models that learn faster.
Well part of the problem with data science is that the person with a hammer sees the world as a nail–that you’re just looking for domains where you can apply the data science techniques. And that’s fine, but the point is there are lots of other domains.
Another thing about computer science is that if you’re interested in something else, medicine, music, literature, whatever, computer science can give you a new way of looking at that. It can enhance whatever other interest you might have. This actually was the theory espoused by Alan Perlis, who was one of the founders of the department and chairman, that he thought that at Yale eventually there wouldn’t be a computer science department–it would be just assimilated into all of the other departments. I think we’re a long way from that.
But maybe it all comes down to, there’s a theory of medicine now, I don’t know if you’ve heard, but it’s that all diseases can be tied together under immunology. Any time you get sick, it can be explained in terms of the immune system.
Well why not? The immune system is probably involved in some way in just about every disease.
Well the idea is that it’s sort of a grand unified theory of medicine and disease. Well maybe data science is the grand unified theory of computation, but I don’t think we are there yet.