Ed Lazowska — Calling Bullshit on Big Data and the Risks of Data Science

LC: I want to talk a little bit about the kind of caution that’s necessary here. You’ve been pretty openly critical, as have many, of Facebook’s lack of responsibility regarding what’s on their platform and their inability to sufficiently monitor or control it. Do you think this is something that can be self-governed, that a community of computer scientists can decide to be more careful, or do you think it requires regulation of some sort? 

EL: Oh, I think it requires intervention for sure. The reason is that, ultimately, companies are avaricious. But I do think that students and employees can vote with their feet. There have been a bunch of articles recently about people turning down either job offers out of college or job offers to switch companies based on what the company is doing. For example, in providing technology Palantir was the most recent article that I read. I think that Microsoft and Apple are in some senses the most principled companies regarding privacy, and they are very principled companies with principled leadership, but the truth is they have something to sell in addition to what they know about you, so they can afford to be principled. Facebook has nothing to sell except what it knows about you. The truth is they’ve had their head in the sand, and they’ve had their heads in the sand at the very highest levels, not just Zuckerberg but the woman who is supposed to be the adult supervision just says crazy stuff. They have a real responsibility for what’s going on and they have abrogated that responsibility. 

I heard an interview with Tristan Harris recently who was called by the Atlantic “the closest thing Silicon Valley has to a conscience,” he co-founded a company called the center for humane technology and he talks a lot about the “attention economy.” He describes how all of these social media and video apps and services profit off your attention, and in doing that they’re always working on ways to manipulate. Snapchat’s streaks are perhaps the most obvious example. Do you think there’s a way that can be regulated, or do you think that’s just a consequence of our economy, that they profit off their attention and there’s nothing we can do about it?

That’s a good question; I haven’t thought about it and I don’t know the answer, but it seems to me like that’s going to be very hard to regulate. You can imagine there would be some self-regulation. You could imagine that students will, among other things, will choose to work at a Microsoft because they view it as a more principled company than say a Palantir at the other extreme. I hope that happens. I hope that students are partly making decisions based on wanting to wake up in the morning feeling good about who they’re working for and what the goals and principles of that company are. There’s no doubt we are going to need regulation. 

This notion of companies behaving in profit interest rather than public interest is not new, it’s not new to technology companies. It’s been the case for tobacco companies, for chemical companies, for energy companies, this is unfortunately the way of the world. 

Another part that plays an interesting role in the future of this issue is artificial intelligence technology. I’ll give you three people and I’ll ask you where you land. There’s the Elon Musks, who says it’s something like the greatest…

Existential threat we face.

Right, the greatest existential threat we face as a society. I read Max Tegmark’s Life 3.0, and I think his basic argument is that we don’t have our goals as a society outlined clearly enough to make sure that AI aligns with those. Essentially AI requires more thinking and a smarter approach than we have right now. And then I think Zuckerberg called Musk a naysayer, saying that his alarmism was irresponsible and during his questioning before congress he discussed how it could be used to monitor Facebook better. 

Well, as always you’ve staked out two extremes and a middle ground, and the middle ground is obviously the right answer here. I think the challenge with the existential threat thing is what that conjures up in the mind of the public, and that is that our Teslas are going to get together in the Costco parking lot and decide to run us all down. That’s at best a long way off. It doesn’t mean it’s not worth thinking about the problem, but autonomous weapons are a much nearer issue that we ought to be thinking about. What happens if you fight a war by pushing a button and no individuals were involved except those who were killed, and all of these decisions are made autonomously? At the other extreme, you don’t worry about the automatic transmission in your car. It may not pick the correct gear every time, but it’s not something you fear. So what’s the spectrum between automation and autonomy? There’s a long distance there. 

There was a wonderful video a couple years ago of a DARPA grand challenge in humanoid robots. What they had to do was get out of a vehicle and open a door, and it’s picture after picture of these very expensive robots falling over. Markoff in the NYT said that if you’re worried about terminator, you just have to go in the bathroom and shut the door, you don’t even have to lock it. We are a long way away, and you want that work to be done. When people talk about existential threats, what it conjures up in the public mind is there’s some immediate threat to our wellbeing. Now there are a set of threats—people are losing their jobs. We’ve enabled outsourcing, we’ve enabled the gig economy, there’s a whole bunch of things that we’re enabling with our technology, and those are things we’ve got to take responsibility for and think about. 

We have at UW, and there are a number of others around the country, a center of technology policy institute which has as its leaders somebody from computer science, somebody from the information school, and somebody from the law school. The idea is to think seriously about the interface of technology and public policy, and that stuff is just critically important and we’ve got to be doing it. There is no doubt that we don’t understand our goals as a society; on the other hand, would we be able to create an AI that implemented those goals properly? I don’t know. 

A big problem now, this is a subject of some debate, but it seems clear to me that expert systems have biases built into them based on the data they are trained on. 

I saw on my phone this morning that MIT made a new AI bot called Norman, and they showed it an inkblot. A normal AI interpreted the inkblot as a person holding an umbrella in the air and Norman said man is shot dead in front of his screaming wife, and it was a somewhat normal AI, they just trained it on data from Reddit. 

Right. Well, we have seen cameras in the market that won’t take pictures of Asians, and the reason is the facial recognition technology avoids snapping the shutter once you push the button until the person has their eyes wide open. We had an image classification system that classified African American males as gorillas because it was being trained on Caucasians and zoo animals. We have judges using expert systems, deep learning systems for sentencing based on the likelihood of recidivism and that data is ridiculously biased. Both arrest and conviction are obviously far more likely if you’re African American than if you’re not. 

There was actually a guy from the Yale law school who was in a seminar here a year ago who pointed out that judges are extremely strongly inclined to take expert advice whether from a human or from a computer because if you take the expert advice you have no explaining to do. If you take the expert advice and it goes wrong, you’ve still got no explaining to do—“I listened to the expert.” Whereas if you do something different you’ve got to explain that, and if you do something different and it goes wrong you’re really in duck soup. So these experts are going to be believed no matter the biases. It’s not that the algorithms are biased, it’s that they’re trained on data that introduces biases because that’s how the world works, and people have got to be aware of this. 

I think it goes back to the education issue, because part of the reason this is frightening because it’s an intellectual challenge. If you can’t understand what AI is, what it does, what it’s used for, and what its dangers are, and how we might address them, then it’s even more dangerous and scary. 

There’s a guy named Jevin West in the information school who devised and teaches this course at UW called Calling bullshit in the Age of Big Data. It’s just a wildly popular course here and is being deployed nationally, and the idea is to make people more discerning consumers of data. It is not a political course at all although it could be. 

Here’s an example: there’s a study, these are mostly published studies, a few years ago someone who used vast amounts of data to look at the life expectancy of professional musicians as a function of what genre of music they performed. It turned out that hip hop artists die younger than other sorts of artists, and this conforms to your bias that these people are all shooting up and shooting each other. Okay? Great. 

The thing that made you smell a rat was that the difference in life expectancy was like more than a decade, it wasn’t just a few years. And the answer, when you looked more closely, is that hip hop is a more modern genre of music so these few hip hop artists who have died, have died young, because there are no sixty year old hip hop artists. You know, they might have died by a heart attack or getting hit by a car, they didn’t necessarily die by drug overdose or by guns. And what he had essentially done was to rank order genres of music by the year in which they became popular. 

It’s a classic correlation causation issue.

Now that’s a rat you could smell because the disparities in life expectancy were so enormous, the danger is that the rats that you can’t smell are equally pernicious if they confirm your biases. Here’s another example: some guys in Japan or China published in a scholarly journal that purported to use deep learning to identify facial characteristics that were predictive of criminal behavior. It had all these angular things about the face, and it was trained on bazillions of photos like yearbook photos and mugshots. Well, low and behold it turns out that if you’re getting a mugshot taken you’re not smiling typically, so what this had been was essentially a smile and frown differentiator. And anybody who didn’t look happy was classified as a potential criminal, but this was published in a scholarly journal—in retrospect just obvious bullshit. 

And on and on, but the goal of this course is to teach people to smell a rat when someone presents you any form of data, and again the real challenge is the rats that are so subtle that they aren’t obvious to you. And if it’s a small delta and it conforms to a preconceived bias then you’re likely to use this as confirmation of what you believe was the case, scientific confirmation of what you believed all along.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s