Stephen Stearns — Data-Driven versus Hypothesis-Driven Research

(“Who’s Watching Big Data?” by cogdogblog is licensed under CC0 1.0)

SS: I’m interested in the Harvard professor who came in and said that he wanted to have data-driven NIH grants. It is of course his right to argue for that, for he probably just wants to be able to get more money. 

LC: I’ll tell you the story. So we had these lunchtime talks and this guy comes in and he tells a story and he says hypotheses are, maybe I’ll exaggerate a little bit what he says, that hypotheses are useless and they encourage you to be wrong and stick by your original ideas. His example was if you take a telescope, or some sort of imaging device, that you point it at the sky and it says “Is the sky red?” And your question is “Is the sky red?” and your hypothesis is yes, and at some point over a few weeks it measures the sky is red and it says yes the sky is red. You come back, you read it, it confirms your hypothesis, and you determine the sky is red. 

And my response is that you’re not measuring what you say you’re measuring. What the device is measuring is “Was the sky red at any point in the last x amount of time?” not “Is the sky red?” So that’s not a hypothesis problem, that’s a problem of just god awful experimental design, I don’t think he proved anything. So I thought he was a total hack, but I mean I think there are some legitimate points about major discoveries and major progress in science have been made with data-driven work. And increasingly in biological sciences and elsewhere there is a mass amount of data that you can find things just by sorting through instead of by coming up with a potential hypothesis and then trying to experimentally test it perhaps. Maybe that’s wrong, maybe that’s just a restatement of the same methodology, but it seems interesting to me at least. 

SS: Yeah, what that guy was doing was creating a straw man and providing a cartoon instead of a realistic research program to shoot down. It’s interesting that at our EEB retreat last year Casey Dunn, who is certainly a very good scientist, got up and said that he was tired of having people say that experiments are the only way to go and that there was dignity and respect in doing large scale data analysis. And certainly that is what people have been doing in molecular phylogenetics. They’re describing patterns, they’re not describing mechanisms. And they’ve discovered some interesting things, so it’s discovery science. It’s like Alexander von Humboldt going to South America or Darwin, Bates, and Wallace discovering lots of new species – things like that. 

That kind of work certainly has an important role to play, and because of the technological revolutions in sequencing and data processing, we’re in an age of big data. People are analyzing it hand over fist. One reason is that they can increase their publication rate without having to get the data themselves. Getting the data is always much harder than analyzing it if you’re someone with computational or statistical gifts. We are learning some interesting things from it, so I don’t want to just poo poo it and say that we shouldn’t be doing it. However, the best that it can do is it can accurately describe a pattern, and in that pattern it can notice a new item of interest. But then the problem is, how do you find out how that thing works? At that point you bring in the experimental method. 

I don’t know any better example than what Gunter Wagner is currently doing. Gunter is coupling comparative biology, the descriptive stuff, with the detailed molecular analysis of processes. Let me call this paper to your attention: a 2019 paper in Nature Ecology and Evolution. The observation was humans have invasive embryos and they have metastatic cancer, and cows and horses have noninvasive embryos and they don’t have metastatic cancers – they have solid tumors. So is it possible that metastatic cancer is driven by the same processes through which an embryo is invading an endometrium to form a placenta? 

Turns out, yes it is. There is now lots of confirmation on that from various research programs. What Gunter did was particularly clever. He formed a cell culture with endometrial stromal cells and grew it on a plate. Then he used cells from trophoblasts to measure invasion rates across that tissue. He did cow on cow, human on cow, cow on human, and human on human, looking at invasion rates. Then he and his team measured messenger RNA concentrations in those tissues to look at which genes were up and down regulated in those different contexts. They concluded that what had happened in evolution was not that embryos had gotten more invasive, but that in our primate lineage mothers had gotten more permissive and allowed them in at a higher rate.  We are vulnerable to metastatic cancer because we evolved particularly effective methods of nourishing our embryos.

So the hypothesis was generated by the comparative method and a broad pattern, but the analysis was done with truly exquisite experiments. That’s a great kind of science. It shows you the role of both things – discovery of patterns and analysis of causes. 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s