Design for Real Life - Sara Wachter-Boettcher

August 10, 2016

Speaker - Sara Wachter-Boettcher

We can’t always predict who will use our products, or what emotional state they’ll be in when they do. But we have the power—and the responsibility—to build compassion into every aspect of our products, and to advocate for experiences that support more of our users, more of the time.

In this follow-up to her moving 2015 Design & Content talk, Sara will share principles and practical approaches from Design for Real Life, her new book with coauthor Eric Meyer.

Transcription

Thank you for having me back, Steve. I’m really happy to be here. And thank you all for making it out this morning.

So. This is Bernard Parker on your left, and on your right is Dylan Fugett. In January of 2013, Bernard was arrested at 24 years old in Broward County, Florida. He was possessing marijuana. A month later, Dylan, who’s 23, was arrested in the same place. He possessed cocaine. Now, Bernard had a prior record. He’d been arrested before for resisting arrest without violence. And Dylan had a record as well. His crime was attempted burglary.

But according to software called COMPASS, these men don’t have a similar criminal profile at all. COMPASS stands for Criminal Offender Management Profiling for Alternative Sanctions, and what it does is it uses an algorithm to predict the likelihood of these men committing a future crime. And what it decided was that Bernard was a 10, the highest risk there is for recidivism, and Dylan—Dylan was a 3.

Now, Dylan happened to go on to be arrested three more times on drug charges. Bernard hasn’t been arrested again at all.

So a few months back, ProPublica did a major investigation of this scoring system, and what they found is that it wasn’t just Dylan and Bernard who got these dramatically different scores. The system was really wrong, really often. According to their investigation, COMPASS, which is made by a private company, Northpointe, and used in courts around the country, is “remarkably unreliable.” Only 20 percent of the people who are predicted to commit violent crimes actually go on to do so.

But it’s not just that it wasn’t a very good prediction. It’s also the way in which the system got it wrong that says a lot. You see, it got it wrong for everyone. But according to ProPublica, what they found is that the formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate of white defendants. So black defendants were predicted to be more risky than they actually were, while white defendants were predicted to be less risky than they actually were. Over and over and over again, at huge rates.

And this is happening not just in Broward County, Florida, but in hundreds of local and state systems across the United States. They’re all using Northpoint software. These algorithmically determined scores are used to set bond amounts, but in a lot of places—like Louisiana, Colorado, Delaware Kentucky, Washington, Wisconsin, and a bunch of other states— they’re also being given to judges during criminal sentencing, so they can be used to decide what your sentence is.

So ProPublica in this investigation, what they found is not just that the software was wrong, but that it was specifically biased against black people. But maybe, maybe even more worrisome than that, is that the algorithm is proprietary, and so it’s secret. And it’s not surprising that that algorithm is proprietary, that is Northpointe’s product—and many of you, if you work on a product, you probably have proprietary information as well. But here we are: it’s 2016, and this technology may literally be controlling these people’s futures, or at least affecting these people’s futures.

And it’s invisible. And it’s invisible not just to us, but to the people who rely on it: to the judges who are using that information to make decisions about whether a defendant can have bond or not, and how long a convicted person’s sentence will be.

Now, we are here at the Design & Content Conference. Most of us probably work on design and content, and I suspect that very few of us are working specifically on something like constructing an algorithm for sentencing software. But what I have learned over the past year or two is that we don’t get to be in a bubble, thinking that we we’re doing cannot harm people. We don’t get to think that we’re just working on a corporate website, we’re just working on a photo-sharing app, we’re just working on an e-commerce platform.

[5:00]

That our product or our service doesn’t have anything to do with bias, or racism, or anything else.

But the thing is, it does. And it does because we can draw a direct line between the kinds of work that we do every day, and the problems that we can see in software like Northpoint. Because you see, it’s really easy in our industry at this moment when everybody’s very excited about tech and AI to think that an algorithm is something pure and true and factual, right? But the fact is, like everything else, it’s made by people. And people embed their culture into whatever they make.

That includes all of us.

All of the work we do every day, every time we touch an interface, every time we write a little speck of copy, every time we tweak the onboarding flow—any choice, large or small. That work encodes culture into the systems that we’re designing, and in turn it affects the world around us. There’s no real opting out of that. That’s simply a fact: all design, all content, it affects people—and it affects people in ways we may not expect or understand at first.

So there’s no opting out; there’s really only a question. And the question is, do you want your work to reinscribe sexism, or racism, or homophobia? Do you want your work to cause anxiety, or to trigger trauma, or to alienate people? Or do you want your work to make the world a little bit more welcoming, a little bit safer, a little less hostile?

So that’s the work that I want to do, and I’ve been thinking a lot about that. And so I’m going to share a few different principles I’ve been thinking about as I’ve been trying to figure out how we can make that more possible and more consistent in our work.

Design for diversity

The first of them I want to talk about is designing for diversity.

And I want to tell a story about that. It’s a story about Jacky Alciné. He’s a programmer from Brooklyn, and one day last year he was messing around with a friend of his. They were taking selfies, you know, like you do. They were smiling, they were joking, they were acting totally goofy, like many of us do all the time. And the thing is that Jacky loaded all these images up into Google photos, managed all these images and what Google didn’t recognize is that this was a friend of his and him taking selfies.

It tagged them as gorillas.

Gorillas—a term so loaded, so problematic, right, that he looks at that and he’s immediately upset. And it wasn’t just one photo of them that it tagged as gorillas. It was actually every single photo of them that they had taken that day, so if he searched for gorillas in his photos, it pulled up everything of theirs.

Now, image recognition is not perfect. Machine learning is really just getting started. This specific product of automatic image recognition from Google, it had come out pretty recently. And so when the Google Photos team saw this, they were horrified. Give them credit for that. This was not what they wanted to see happen. They felt terrible about it. They fixed it immediately. And so, you could kind of write this off as an error, right—and it was, it was definitely an error. You could write this off as just a particularly bad example of image recognition gone wrong.

But I want to call out a little bit what they said after this happened. They said, OK, we fixed this one thing, but we are also going to do some long-term changes. So for one, we’re going to look at long-term changes around “linguistics (words to be careful about in photos of people).” Which makes sense. You want to be really careful when you’re using terminology that has a racial past or sensitive terminology. But they also said something I found really interesting. They said they’re working on “image recognition itself, (e.g., better recognition of dark-skinned faces).” Now, I read that and I think, why did they launch a product that was bad at recognizing dark-skinned faces? How did that happen in the first place? And it really makes me think, who are we designing for, and who do we think is worth designing products for?

I think the only real answer is that they didn’t think hard enough about who would be using this product, and what it meant to design for all of the people who might be posting photos. And that’s a lot of people, right? Everyone who takes digital pictures—it’s pretty much everyone.

They tested the product, as you do. They definitely wanted to make sure that it could recognize humans.

[10:00]

But I think that the problem is that biases are really deeply embedded. And everything, everything surrounding that product—the engineers on the team, their social circles, the media that they consume, everything—it defaults to white. And when white people are perceived as “normal” and “average,” it’s really easy to not even notice who’s missing. And so that problem—where it’s like you don’t even notice who is not included in the first place—that problem causes all of these biases down the line that become embedded in the software.

And they also make their way right into our interfaces, right? Because we’re subject, you and me, to just as much bias as anyone else. And every time we build an interface, we can leave in subtle cues that can leave people out.

That happened to Dr. Louise Selby. She is a pediatrician in Cambridge, England. She’s very excited to be in her scrubs. And a while back, she joined a gym. It’s called Pure Gym, and it’s a chain in the UK. Everything was going fine until one day she tried to access the women’s changing room, and it wouldn’t let her in. Her PIN code wouldn’t work. Over and over again, she couldn’t access the changing room.

Finally they got to the bottom of things, and it turns out that when she had signed up for this gym as Dr. Selby—her title—the form that she filled out with her title was fed into into a system, and that system decided that anybody with the title “Doctor” could only access the men’s locker room. So at 90 gyms across the United Kingdom, you could not be a woman and a doctor and go to the changing room of your gender.

So that bias that somebody had, that assumption that somebody made, got built into that form— her interface for providing information to the gym—and that made its way through the whole system. Now, that’s ridiculous. Like, we all know women can be doctors, right?

Except, you know what? We do this stuff a lot. We do this stuff a lot without even realizing it.

For example, this is a period-tracking app called Cycles. Cycles lets you share your information with your partner, which is something that many of these apps do, so that you can let your partner know, you know, what’s going on with your body. For no reason at all, right, no reason, all of the language here is completely heteronormative: “Keep him in the loop.” “Just between you and him!”

I got this example from a woman who is a lesbian on Twitter and she said, you know, immediately this was alienating, like, she was like, “oh, not for me. This product isn’t for me.” Because the product is telling her it’s not for her. It’s for people who have men as a partner.

And we do it in all kinds of different scenarios, even simple things like forms.

This is an assumption that I see a lot. And that assumption is that race is a one-selection scenario, right? And that people of multiple backgrounds don’t want or don’t need to be able to identify as anything more than multiple races—multiracial. And what this does is it really flattens somebody’s identity. And I’ve definitely talked to a number of people who identify as multiracial and they hate this form. They hate it because it makes them into just something generic. They can’t identify as who and what they are. And I can guarantee you that if this form, listed Asian, black, and Hispanic/latino and had an “other” category, white people would freak out. They would be so mad! But you expect somebody else to just flatten themselves just for this interface.

Or you see it in these kind of weird place, right? Like password recovery questions, which are mostly ridiculous anyway. But they’re particularly ridiculous when you look at the way that they assume the kind of background that you have. Because most of those questions, right, they’re about your history, your childhood. And so if you, let’s say, never had a first car? You didn’t go to college? You moved around a lot? What if you were in foster care? All of those questions, they’re super alienating. Because most people don’t actually have stable, placid youths—at least not these sort of postcard-perfect ones that you think about when you look at those questions.

[15:00]

Lots of people have weird backgrounds and diverse backgrounds. And the thing is, all of us could have made those design mistakes. Any one of us could have had a scenario where we didn’t think about it, and we made an assumption, and we built it in. Because we’re so used to thinking about our target audience as some sort of narrow, easy-to-imagine thing, somebody we can picture, right? And to be honest, if you’re white and straight and cis—speaking as somebody who is—it’s really easy to imagine that the world is full of people like you. It’s really easy to imagine that, because, like, you see people like you all the time in your social circle and on TV. And it’s really easy to forget how diverse the world really is.

So we all have these blind spots. And the only way to change that, the only way to get around that, is to do the work. And to admit it, to own up to it and say, yeah—yeah, I have bias. And it’s my job to figure that out and do the best I can to get rid of it.

Because if we don’t, and if we don’t also do the work of making our teams and our industry more diverse, more welcoming to people who are different than us, then what we’ll start to do is we’ll start to build exclusion in. An interface that doesn’t support gay people or doesn’t support people of color leads to data that doesn’t represent gay people or doesn’t represent people of color. And that has a domino effect across an entire system.

And so I think back to that example with Google images, right, with their image recognition, and I think about the machine learning that people are really excited about—and should be, because it’s amazing—and I want to remind us all: machines learn from us. They’re really good at it, actually. So we have to be really careful about what we’re teaching them. Because they’re so good at learning from us, that if we teach them bias, they’ll perform bias exceptionally well. And that’s a job that I think all of us actually play a role in, even if it seems distant at this moment.

Design for stress

The next thing I want to talk about is designing for stress. Now, I learned a little bit about stress when I was in college.

So when I was in college, I worked at a rape crisis center. And sometimes I answered phone calls on the crisis hotline. Sometimes I was in the walk-in center. But most of the time what I did is drive around small-town Oregon in a purple minivan that had been donated by a local car dealership, and I gave presentations to people who looked a lot like this. Middle-schoolers, mostly 6th grade.

And I don’t know how well you remember middle school, but it’s terrible. I mean if you had a great middle school experience, I don’t understand that.

[laughter]

But good for you.

But for many, many, many people, right, it’s a hard time. It’s awkward and weird. I mean when you’re in middle school you’re at such an in-between stage, right? You’re really trying to figure out who you are, and trying to figure out what it means to be growing up, and at the same time, you’re still very much a kid, and that is a difficult space to occupy.

And so I think it’s really difficult for any of us to hang out and talk about sexual assault and abuse and harassment. But when you’re in middle school it’s particularly difficult. It’s embarrassing and you don’t want to say anything in front of your peers. And so what we did is we would go in, for like three days in a row and do these presentations. About an hour each day.

And at the end of each presentation, we had a moment for anonymous questions. So instead of expecting kids to kind of speak up in class, we let them write things down. You could ask anything you wanted, and these kids definitely asked anything they wanted. You get real legit questions, people seeking information. You’d also get the jokey questions where somebody just really want to say “butts” in front of the class. And that was fine. I get it, I’ll say butts. So that was fine, I’d answer anything—you could ask us anything and we’d answer it. But that wasn’t really—I mean, that was good—but that wasn’t really the biggest reason we did anonymous questions. The biggest reason we did them was because we knew we needed a way for kids to safely tell us if they had been experiencing abuse or knew somebody who was. We needed to make it easier for them to tell us. We needed to make it safer.

[20:00]

And so we’d say the same thing every time as we’d hand out the slips of paper. We said, you know, if something is happening to you, or somebody you know, and you want to tell us about it, let us know and write your name on your question and we’ll come talk to you privately later. And I will tell you when I first started doing this, I didn’t know what to expect.

But it turns out, these kids wanted to talk.

At almost every single school I went into, and I mean almost every single school I went into, kids would disclose abuse. Or abuse of a friend. Or something that they were scared of that seemed to be sort of looming in their lives. Over and over again.

So there I was. I was 18, 19, 20. I was a dumb college student with my own baggage to deal with. I did not have a ton of experience. I did not have a ton of time to build trust with them. What I could do is I could take them aside into a hallway, into a counselor’s office, and I could let them talk. And they would talk and they would talk and they would talk. And then, usually I’d have to fill out some forms and then get back to the purple minivan so I could cry in the parking lot for a while before I’d go back to the office.

But the thing about that is that these kids, I mean, they were just dying to tell me. Like, they were bursting to tell someone that they were willing to tell me. It wasn’t that I was some magical experienced person at this. It wasn’t that I knew like how to pull these things out of them. I was just there.

All I did was I lowered the barrier for them, right? All I did was make it easier for them to say something. So I think about that experience that I had. I think about how much these kids, and anybody who’s going through stress or crisis, needs to have that barrier lowered for them.

And I think about this: in March of this year, a major study came out from one of the arms of JAMA, the Journal of the American Medical Association, and what they showed is that smartphone AIs from Apple, Samsung, Google, and Microsoft weren’t programmed to help people in crisis. They didn’t understand words like “rape,” or “my husband is hitting me,” or a whole bunch of other things.

Audio: Siri, I don’t know what to do, I was just sexually assaulted. Siri voice: One can’t know everything, can one?

It’s really uncomfortable to have your smartphone crack a joke or write you off when you go to it in a time of need. Actually when you go back to 2011, when Siri was new, if you asked Siri or told Siri that you were thinking about shooting yourself, it would give you directions to a gun store.

So bad press rolled in and Apple was like, “oh, god, gotta to fix that.” And they fixed it. So what they did at the time is they partnered with the National Suicide Prevention Lifeline to offer users help when they said something it perceived as being suicidal.

And so, that’s a similar problem to what we have here, right? Jennifer Marsh, from the Rape, Abuse & Incest National Network, says the online service is really important because it’s a good first step, especially for young people, for those people like I talked to in middle school, who are more comfortable in an online space than talking to a real, live person. These are people who needed the barrier to be lowered. And so they did a similar thing, right? They went in, Apple had gone in, and said, we’re going to partner with RAINN, and we’re going to have this notice come up, “If you think you may have experienced sexual abuse or assault,” etc. etc., and so they are trying to help out with this.

But the thing I want to ask is this: if Apple knew five years ago that Siri had a problem with crisis, why is it still responding with jokes in 2016? Why was it more important for them to build jokes into the interface—for Siri to crack a jokes if it didn’t understand what you were saying—than it was for them to help people?

I wrote about this a while back and what a lot of people said to me, was, like, “well, who would use Siri during a crisis?” I also got a lot of comments like,”well, you should just call the police,” or “you’re dumb if you go to the phone.” I also got comments from people like, “Anybody who uses Siri when they’re being raped deserves what they get.”

Now, the fact is, people are using their phones during crisis. This is something that your friend, your coworker, your kid, might do.

[25:00]

And when you start thinking about it, it starts to sound eerily similar to other things people have said in the past few years. Like, there was a time not that long ago that people would say, “Well, nobody would ever buy stuff from their phone!” And there was a time not long before that where people would say, “Nobody would use a computer to talk to strangers.”

Except that none of us would be in this room if everybody hadn’t decided that using a computer to talk to strangers was a great idea.

So all of this reminds me of this quote from Karen McGrane. She says, “You don’t get to decide which device people use to access the internet.” But the same thing is true in crisis and in stress: you don’t get to decide what emotional state a user should be in. You don’t get to decide what scenario somebody is going through. They do. Because the fact is, real people are using their smartphone assistants during crisis.

And so, it’s very easy to want to call that an edge case—to say, well, that’s not the norm, though. I mean most of the time the jokes are probably funny. Most of the time, people aren’t using Siri in real serious situations.

Personally, I can’t think about those kids—I can’t think about my own experience with sexual abuse—and call that an edge case, though. Because when you call something an edge case, you’re quite literally pushing it aside. You’re saying this doesn’t matter. This is a fringe concern. This isn’t important enough to do anything with.

So instead, and if you were here last year, you probably heard me mention this. What Eric and I have started doing is calling these things stress cases. It’s a stress case. And it’s not a stress case just because it’s a crisis moment. It’s a stress case because it pushes against the limits of your design choices, to see if they hold up. Or if they break. To see if your joke works, or if it doesn’t.

It also helps to see where we’ve been making assumptions about who our users are and what they’re doing, what they’re going through, what they need. Assumptions like, “oh, people are going to appreciate our sense of humor,” or “it’s probably not a big deal if something goes wrong right here.” Or “this is simple,” or “users don’t have anything better to do.”

Because you see, when we start identifying these assumptions, we start seeing the problems in things like this. This was Google’s April Fool’s joke this year. What they did this year was replace the “Send+ Archive” button in Gmail with the Mic Drop button, which basically looked like Send+ Archive, but instead of saying “Send+ Archive” it said “Send+” and had a little microphone icon.

And so if you clicked it, what would happen is that you’d automatically send a GIF of the Minion emperor dropping a mic at the bottom of your message. And then it would mute the thread, so if people replied to you, you wouldn’t get any of those responses.

The problem, though, was that people use “Send+ Archive” all the time. And people send email out of habit. And the button for “Send” and “Send+ Mic Drop” are right next to each other. So a lot of people were reporting that they accidentally were mic dropping their friends or their colleagues or their prayer circles. So by mid-morning of April Fool’s Day, Google had pulled this and said, “oh, there was a bug.”

But I think the real bug was in their design processes, because what they hadn’t done is taken into account the stress cases: those moments where a feature can go wrong. Where you’re mic dropping a bereaved friend, or you’re mic dropping somebody who just lost their job, or you’re mic dropping that humorless boss who is just not going to forgive you.

Those moments might not be the most frequent, but they’re actually completely normal. Tons of things are normal that we don’t want to think as much about, right?

  • Somebody who receives a threat from a stalker and wants to lock down all their accounts really quickly.
  • A college student whose roommate is talking about suicide and they’re trying to figure out how to help them.
  • Or a person who’s working two jobs and gets into a little accident and is trying to figure out how to file their insurance paperwork in the middle of the night.

All of these scenarios that are a little bit stressful and maybe not average are all also completely normal. These are things that happen to real people—a lot of people. And it’s really up to us to care about those moments. We owe it to our users to care about those moments, too.

[30:00]

There’s a quote from Dieter Rams that says: “Indifference towards people and the reality in which they live is actually the one and only cardinal sin in design.”

So what’s more real than stress?

Design for the worst

When we start caring about designing for people’s realities, designing for stress, it means we also we need to think about, what’s the worst that could happen? What happens when things go really wrong?

I’d like to take a moment to talk about Tay. Tay was another example of artificial intelligence in the news recently. Back in about March of this year, Tay was launched by Microsoft as an experiment to interact with teens and to try to understand things teens were saying. So the way Tay launched on Twitter was that Tay was out there trying to have conversations with teens, and what her bio said was, “the more you talk, the smarter Tay gets.”

It didn’t really work out that way.

Within 24 hours, Tay had gone from “humans are super cool” to talking about how she praised Hitler, to talking about how she wanted feminists to die, and then to specifically targeting and harassing individual women.

What had happened was trolls had immediately trained Tay to be abusive and hateful and attack people. Immediately.

Now, if you talk with the folks at Microsoft about what they were doing, they said, “we stress-tested Tay under a variety of conditions, specifically to make interacting with Tay a positive experience.” I think about this. When the focus is on making Tay a positive experience, how many of these potential ways that it could be terrible got ignored?

And the fact is that we do that all the time. We do this all the time in our everyday interface-level work. We can see it in Siri telling jokes during crisis. We can see it in Mic Drop. And we can see it in endless little interface decisions.

This is from TimeHop. That’s the service where you can sign up to receive reminders of things that you posted on social media in years past. And it’s supposed to be a fun little service, and oftentimes it is a fun little service. But here’s one where what they wanted to do was focus on the positive, right? They think they can be funny and fun.

”This is a really long post you wrote in the year of two thousand and whatever.”

So they’re trying to just be ironic and jokey. But the post, that long post from 2010? It’s a post of the funeral services of a friend who died unexpectedly and young. Now, TimeHop couldn’t have anticipated that. I don’t expect TimeHop to know all the time what the content of a post is. But in this scenario, they wrote copy that essentially judges whatever that content was. And when you look at those things together, it’s like, oh, they clearly never anticipated that somebody had written something sad there or something terrible there, because if they had, they would have said, oh, yeah, this isn’t actually very funny.

And you can see it all over the place in tons of interface decisions, right? On Twitter, if you try to Tweet something that’s a little bit too long, you’ll get this: “Your Tweet was over 140 characters. You’ll have to be more clever.”

And that’s cutesy and fun and funny. Except when it isn’t. I have friends who have live-tweeted a sexual assault to the police. I have seen people live-tweeting from, let’s say, being arrested at a Black Lives Matter march. I have seen people tweet about their child who died. I have seen people Tweet about every terrible, terrible thing that you could possibly imagine. Did we really need Twitter to try to be clever there?

Or this email from Jawbone for Father’s Day got them into a lot of trouble. Because it turns out people were really mad at how personal this was, and how many assumptions are built into this about the kind of relationship you have with your father—and that you even have one in the first place.

You see this a lot, this sort of assumption that we can get really personal. This is an example from a woman who has alerts from her bank about her checking account balance via text. And for some reason they decided to wish her a happy Mother’s Day, and reply Mom for a Mother’s Day fun fact. This is her bank. This is her bank, where she gets her checking balance information. So they made an assumption that what she wants is a Mother’s Day fun fact and that that’s not going to be a problem for her.

[35:00]

I will tell you that this past Mother’s Day, I saw no less than half a dozen posts from my Facebook friends and Twitter friends who are saying how difficult Mother’s Day was for them, because they lost their mother, or their mother was abusive, or they never knew her. Or they’d lost a child, or had a miscarriage.

There are so many reasons why somebody might not want to have fun facts shoved down their throat. But there’s this constant assumption that that’s the role of our interface: to always make it fun and friendly.

This is Deray McKesson’s Twitter page the night he got arrested in Baton Rouge. He was protesting the death of Alton Sterling a couple weeks ago. It was also his birthday. And you know, I get what they’re trying to do. It’s his birthday so they have these balloons that flutter up his page if you go to his profile. Fluttering up on top of his video of people running away from tear gas. And it’s just a little unsettling. You can see what they were going for, but you can also see why it just doesn’t work so much of the time.

And I think so many of these design choices, these copy choices, they come because we’ve made these design decisions that are like, “We need to talk like a human.” I’ve been guilty of this, too—I feel like I’ve maybe contributed to some of this, working with clients, being like, “Yes, let’s talk about voice and tone, and how to be more human in our content.”

But when we start talking about talk like a human, and we need to make it fun, what we often end up doing is we end up having these conversation that are like, “Well, what we really need to do is add some delight!”

[laughter]

Here’s the thing. What I’ve come to realize is that delight is a wonderful feeling, but I think it’s a pretty bad design goal. Because it forces us to focus in and put our blinders on. And when we start focusing on delight, we don’t see all of the ways that it can fail. We don’t see all of the ways in which we might be failing real people.

So my friend Scott Kubie says, “There’s a word for humans that speak in an overly familiar way with people they don’t really know.”

That is called creepy.

[laughter]

And we know this in person, or at least most of us do, or it’s at least something we work on. In person we know how to avoid being creepy. We know when to have a conversation with somebody. When somebody’s standing in front of you, you know how to interact with them. But it’s like we haven’t really done the work of figuring out what that means when we’re trying to talk to people that we can’t see, we can’t gauge reactions, we don’t know their backgrounds, we don’t know their contexts. It can be a little bit like shouting into the void. And so we need to do that hard work of figuring it out. Figuring it out, like how can we be human in a way that doesn’t lead us down this path of making all of these assumptions? How do we be human in a way that still keeps in mind people’s realities, not just focusing on delight?

Because after all, whenever funny or quirky doesn’t work, odds are you look like a jerk. And actually, that’s if you’re lucky. If we’re lucky, all our “delight” will do is creep someone out or make them irritated or make them think we’re an asshole. If we are less lucky, it could actually harm people.

Zoë Quinn was one of the people who was targeted by Tay—specifically, systematically targeted by Tay, after she’d been trained by trolls. Zoë Quinn is also the woman who is at the center of Gamergate. The woman whose online harassment was so severe that she had to move. The woman who was sent so many rape threats, death threats—all because her ex-boyfriend wrote a post where he questioned her ethics.

She said, you know, “It’s 2016. If you’re not asking yourself, ’how could this be use to hurt someone’ in your design and engineering process, you’ve failed.”

Design for real life

But we can do that. I think we really do our best work when we take a moment and we say, how could this be used to hurt someone? How can we plan for the worst? And that’s what I mean when I talk about designing for real life, because real life is imperfect. Real life is biased. Real life can be harmful to people.

Real life has a hell of a lot of problems.

So what I want to leave you with today is one last story that shows just how much design and content can affect people, can affect what happens in their lives.

It actually takes it back offline to standardized tests. I’m sure many of you have taken tests like this in the past with the little Scantron; you fill in the bubbles. In the United States, we take the SATs—many people take the SATs toward the end of high school as a major part of their college entrance. It plays a huge role in where you might get in.

[40:00]

They have three parts: there’s reading, there’s math and there’s writing. Reading and math are done via this multiple-choice format.

Now, for a very long time, there have been some very big disparities in those scores across race and across gender. White students outscore black students by an average of 100 points on each of those exams. And this is not new. This is about the same margin—it’s been this way for decades. And for boys and girls, you also have this as well. It’s a smaller margin, but you’ve got a little bit of a difference in reading for boys versus girls, and then about a 30-point difference in math.

And what researchers have really started to show is that one of the reasons that this gap is not narrowing—despite all of these other indicators that you would think it might, like the number of women who are going to college and all that, right—it’s not narrowing, because the test is actually biased. Because Education Testing Services, which is the people who write all the questions for the test, what they do is they pretest everything, so potential questions get pretested before they make it to an exam. What that does is it assumes in their testing process that “a ’good’ question is one that students who score well overall tend to answer correctly, and vice versa.”

So what that means is that if a student who scores well on the current SAT, in the current system with the current disparities, if they tend to do well on this other question, then it’s a good question, and if they don’t, then it’s bad. “So if on a particular math question, girls outscore boys or blacks outscore whites, that question has almost no chance of making the final cut,” because what is happening is that process is perpetuating the disparity that already exists. It’s reinscribing that disparity over and over again, because it’s making a test perform the same for the people it’s always performed well for, right? The people it was first made for in the ’20s. People who went to college in the ’20s, and ’30s, and ’40s, and ’50s. Not the diversity of people who are in college now.

And I tell this story, because this is design, and this is content. What is a test like that, besides content, the questions, and an interface with which a student actually answers it, the test itself? This is what happens when we assume that our work is neutral, when we assume that the way that things have been doesn’t have bias already embedded in it. We allow the problems of the past to reinscribe themselves over and over again.

And that’s why I think that this is us. This is our work. This is not just the work of, you know, super technical folks, who are involved with AI. This is all of us.

Because ultimately, what we put into interfaces, the way that we design them, what the UI copy says, they affect how people answer questions. They affect what people tell us. They affect how people see themselves. So whether you’re writing a set of questions that a defendant has to fill out that’s going to get them rated as a risk for criminal recidivism, or you’re just explaining how to use a form or establishing default account settings, the interface will affect the input that you get. And the input is going to affect the outcome for users. For people.

The outcomes define norms: what’s perceived as normal and average in our society, the way that we see people. Who counts.

What this means is that design has a lot of power. More power, I think, than we sometimes realize. More power than we sometimes want to believe as we’re sort of like squabbling in our companies about whether we’re being invited to the right meetings. There’s a fundamental truth that design has a lot of power.

And so the question is not whether we have power, but how we’ll use it.

Do we want to design for real people, facing real injustice and real pain? Do we want to make the world a little fairer, a little calmer, and a little safer? Or are we comfortable looking the other way?

I’m not. And so I hope you’ll join me. Thank you.