Ed Lazowska — New Applications of Data Science

EL: We explained to Mark Emert when he became the UW president that there was this new form of computational science coming along, which was data science, and if UW is going to continue to excel we had to be good at that. To Mark’s enormous credit, he got this totally, and he managed to get us some seed funding from the legislature to bootstrap what became the eSciences institute. Because of the way we got that funding, we had a commitment to environmental oceanography, and here’s the story there. 

I’d been involved in this thing called the ocean observatories initiative, and an element of that was to deploy a few thousand kilometers of fiber optic cable off the coast of Washington and Oregon and turn an aspect of oceanography into an observational science from what had been an expeditionary science. Oceanography for centuries had been you go out on a ship and you measure temperature, salinity, and pressure where you happen to be and you take your notebook home.

LC: I didn’t realize that fiber optic cables through the sea are for more than just communicating quickly, I didn’t realize they are sensors as well.

Yeah well, backing up a bit, astronomy had got here a decade before because the Sloan Digital Sky Survey (SDSS) totally transformed astronomy, not all of it, but astronomy had traditionally been entirely expeditionary. You would spend three years as a grad student designing a new sensor for a telescope, then you would bid for telescope time, you would sit on some mountaintop freezing your butt off for a couple of weeks with your sensor in the telescope, and then you would take your data home. And the idea of the SDSS was the Sloan foundation would build this humongous telescope and it would scan the sky and make the data available to everybody. The estimates are that hundreds of times more discoveries got made because of the wide dissemination of this data. 

Your competitive advantage as an astronomer was no longer getting the data, it was discovering interesting stuff in that data. As a separate thing, Sloan built websites and made this available to kids and school teachers and stuff like that, amateur astronomers, so it totally democratized the field. We were trying to do this for part of oceanography, and that was our pitch to the legislature for the eSciences money. I realized that it was going to be a long time before this ocean observatory was actually delivering data, so I went to the guy who was running oceanography at the time and said find me some oceanographer who has data today, and he came up with a woman named Ginger Armbrust who now runs oceanography here. We had this partnership with ginger to look at data analytics for the work she was doing. 

It was called environmental metagenomics. Roughly, it’s how do things like fertilizer runoff change the genetics of the seas around. So we started working with ginger—she was working with the Gordon and Betty Moore foundation at the time—and introduced they got interested in funding data science, and in the end the Moore foundation and the Sloan foundation wound up funding UW, Berkeley, and NYU to run five year projects trying to push data science out across their campuses. The goal here is both to build the tools and build the capabilities that allow scientists in a wide range of fields—social sciences, the life sciences, the physical sciences—to do discovery in this new way, which is analyzing vast amounts of data. 

As an example, if you think about sociology, if you want to study the formation and evolution and dissolution of cliques, you used to pay a bunch of undergraduates six bucks to participate in a focus group over lunch, and now you’re trying to get your mitts on two bazillion users worth of social networking data.

It reminds me of a story I heard where some middle schoolers analyzing International Space Station data found a bug in one of their sensors and reported it to NASA.

Yes, great, fantastic. So again, democratizing access to data is really important for a bunch of reasons. 

I know you’ve done a lot to further computer science and data science education. Can you tell me about how that is fitting in?

Well, as recently as a few years ago, there was a new set of high school science standards promulgated by the federal government and computer science was essentially not mentioned at all. I guess are we part of the math standards or are we part of the science standards or what, but there was like nothing there and it makes no sense. I just think appreciating the role of the field and helping other people appreciate the role of the field is really important. At that point you’ve got to get engaged with the government at all levels. 

I think scientists are better at this than they used to be, sort of explaining what you do in terms that people can relate to. At the state level, a lot of it is advocating for improved K-12. Washington state does not have a very good K-12 system. You went to a great school, there are a set of great schools around, but on average we are underperforming many, many states, not just Massachusetts, and somehow we tolerate this. And it of course affects underrepresented, economically disadvantaged people even worse than people who live in Bellevue. It’s a real problem, and we have all got to push to solve that problem. 

I think another interesting issue is that we have a number of people in science right now who didn’t grow up on the notion that computer science and data science are important, so how do we assimilate these people?

Well part of the role of the eScience institute here is to give people that capability. So partly we are bringing in a new generation of faculty who actually understand the data science as well as their astronomy or sociology or mechanical engineering. Partly we have a whole bunch of boot camp type education programs for existing grad students and postdocs and faculty, and now undergrads. Partly we have a set of research scientists who are funded to partner with people. We have a set of programs where you can bring us, you come to us with a problem, “Look I’ve got this really interesting science problem, it’s got this data science problem, angle to it that I don’t know how to handle, and I’m going to come sit with you half-time for a quarter while you help me figure out how to tackle this problem.”

Do you think that the future of data science is going to continue to be this consulting thing where there are specialized data scientists who need to help others?

No, my view is that ideally ten years from now there is nobody in academia who calls him or herself a data scientist. Similarly, you don’t, these days, with rare exception, hire somebody who is called a computational scientist, they are a chemist and they do their work computationally because that’s how a lot of chemistry is done these days. At UW, and I hope this is the right choice, we have a master’s degree in data science program because industry hires these people, but we do not have a bachelor’s program or a doctoral program in data science. What we have is what UW calls a transcripted option. Our goal is that all of these students are going to be first and foremost capable in their field, but additionally they are going to have enough chops in data science towards the future

I guess my concern is that as people learn data science sort of on the side, in this minor capacity for use in their own field, since it’s such a dynamic field in itself, once they go to apply it new things will have arisen and what they learned for data science in undergrad that they’re not continuing to build on will be out of date. 

There are a couple different aspects to that. One is that, at a University, you are trying to teach people to learn on the job, so life-long learning is part of what we try to teach in all fields. Secondly, these options we teach provide a pretty good grounding. Third, at the graduate level we have an advanced data science option that allows you to innovate in data science as well as use state of the art tools. But anybody who gets any of these options ought to be able to use stuff at the state of the art. Honestly, what we see in computer science is some of our most successful grads have six jobs in ten years, and the reason is they’re trying to stay at the forefront. Some of those may be different jobs at the same company, some of those may be company changes but these are successful people who understand that part of what you need to do is stay at the forefront and keep learning because things are always changing. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s