Regulating Artificial Intelligence - How to control the unexplainable

[APPLAUSE] SAMUEL L. VOICHENBOUM: Thanks everybody for coming. This is really exciting-- a really exciting program. It's being sponsored by the Graham School and three of our master's programs. We have a master in biomedical informatics, a master in data analytics, and a master in threat and response management. And these programs are all geared in some way toward analysis of data, and using machine learning, and using analytics. And part of the art of the thrust for all the programs is to understand how to use data in an ethical way and how to use these algorithms that we're developing in a thoughtful way. So I met Andrew a couple of years ago at South by Southwest through a mutual friend. We were both giving talks that year. And we just had this amazing conversation talking about applications of machine learning. So as a physician, we're seeing the use of machine learning algorithms all over the hospital and all over medicine from things like predicting cardiac arrest, to predicting sepsis, to predicting patients that are going to be readmitted to the hospital. And we just plow ahead with developing our models and developing our predictions. And it wasn't until I spoke to Andrew that I really stopped and thought about the implications of these algorithms and how they could be used in both good, but in also bad ways. And it was really an eye-opening experience. And ever since that meeting, I've been really excited about bringing Andrew here to talk. So Andrew is a fascinating guy. He used to work for the FBI cyber division. He's the Chief Privacy Officer at Immuta, which is a data science company and just does really fascinating work, and just is very, very tuned in, both to the technical analytic side as well as to the legal and ethical implications of the work that we do. So the title of Andrew's talk is, Regulating Artificial Intelligence, How to Control the Unexplainable. And as you listen Andrew, I want you to keep in mind not only the science and computer science of what we do, but also the social implications. And I hope that the questions that we talk about, that we discuss afterwards will run the whole spectrum of the implications of this kind of technology. So with that, Andrew, excited to hear your talk. And take it away. ANDREW BURT: Wonderful. Thank you so much. Let me just switch here. And while I am switching, I will say thanks to everyone for coming out. Thank you, Sam, Wendy, Suzanne, everyone. I wanted to start today-- I should also say this is a condensed version of a longer talk. And so I want folks here to keep me honest and try to keep this a little casual. So I'm going to do my best not to go off script. But I want to start today and the issues, In fact, by introducing you to a horse. More specifically, this is Hans, or Clever Hans as he became known as. And Hans was one of the most famous horses in the world about 100 years ago. He was raised by a man named Wilhelm von Osten. Hans lived in Germany. And he was thought to be incredibly, incredibly smart, hence his name. So this is Hans at a public fair demonstrating his intelligence. Folks that know about him, just no spoilers. So he was thought to speak German. He could perform arithmetic. He could count objects and much more. Here's a firsthand account of how he communicated numbers. "Though small numbers were given with a slow tapping of the right foot, with larger numbers, he would increase his speed. After the final tap, he would return his right foot to its original position. And zero was expressed by a shake of the head." So one example of a question he'd answer was, I have a number in mind. I subtract 9. And I have 3 as a remainder. What's the number I have in mind? Hans would unfailingly tap the number 12. So Hans was, quite simply, the most interesting horse in the world. And this is Hans with his owner in front of a board that he used to help him communicate. And so Hans became world famous for his clear display of animal intelligence. I didn't realize there were slides over there. This is an article in The New York Times from 1904 attesting to Hans's feats of intelligence. And in this article, the reporter recounted, "All of these feats stated," I'm going to quote, "the facts here are not drawn from the imagination, but are based on true observations and can be verified by the world's most preeminent scientists." So what is going on here? Why am I starting a talk about machine learning by introducing you to a horse? Two reasons. The first is that Hans illustrates something really profound in the way that humans approach the problem of intelligence, in animals, in machines, and in humans. So in 1911, a psychologist named Oskar Pfungst published this book in which he demonstrated that Hans wasn't actually that clever at all. In every case, Hans was watching the reactions of his trainer and reacting to involuntary cues in the body language of that trainer. And so this wasn't a hoax. van Osten didn't know he was creating these cues. But while we were assessing the intelligence of Clever Hans, Clever Hans actually demonstrated something very deep about our own intelligence. And that is, we have significant cognitive biases. The way we process information is prone to irrational choices. One of the cognitive biases we have baked into our brains is called cognitive bias. It causes us to look for things that conform to our existing hypotheses or beliefs. And so Clever Hans is really a testament to the fact that we don't process information rationally ourselves. And affirmation bias is one among many different types of bias. So I bring this point up as a starting point because it's something we need to be acutely aware of and acutely sensitive to when we think about this problem of intelligence, artificial or otherwise. There is a lot, frankly, about this topic that might lead us astray. But Hans also illustrates something else that is specifically relevant to the way we approach AI today. In the early 1900s, we simply could not understand the way that Hans was processing information. And yet almost all of his answers appeared to be correct. Indeed, he seemed to know everything that we knew. And the ability to-- the inability, I should say, to fully explain why answers are correct, or how reasoning is occurring is exactly the type of problem we face with artificial intelligence today. That is, what I would posit is the fundamental challenge we're facing when we think about deploying AI. So in AI, input goes in the form of data. A so-called black box of AI makes its decision. And that decision is usually right. And in practice, there's a deep sense of discomfort about what this actually means. And when proposals are brought up to regulate AI, this is what they're focused on, this type of unexplainability is what they're focused on. And it's frequently what they're actually fighting. And so today, I'm going to talk about a number of different dimensions of this type of unexplainability. But what I'm really going to contend, and my overall message is that the very idea that we can explain how decisions are being made, the very idea that we need to explain how decisions are being made is actually something we're going to have to move beyond if we're going to fully embrace this technology. And so when we talk about regulating AI today, and what that might actually mean in practice, which I'm going to make some suggestions about, I'm going to be talking about what it might look like to move beyond explanations and beyond explainability, or at the very least, put less weight on the importance of explainability. I'm going to frankly be talking about what a world might look like with a lot more Hanses, and how we might seek to effectively regulate and manage risk and manage the ethical dimensions of a world that looks like this. And so that brings me to three points I want to make today. First is, I want to talk about what AI is specifically and the major challenges it presents. And when I say AI, especially for this crowd, I want to be very specific about what it is that I mean. I'm really using the colloquial term, the pop culture term, or what in practice I think is machine learning. And even to be more specific, what I'm really talking about is the increasing use of neural networks in a variety of different settings. So when I say AI, if you're like some of the data scientists that I work with, if that makes you cringe, what I'm really talking about is the increasing prevalence and prominence of neural networks, which I'm going to talk about in a second. The second point I want to make is that we've been here before. And in fact, not all of the challenges we face when we think about regulating AI are new. And so what I want to do is I want to talk through past attempts to address these challenges. And I want to talk about what these attempts can teach us. And then lastly, what I'm going to do is set forth some constructive suggestions on what it is I think we should actually be doing moving forward to regulate this technology. And I'm going to focus on three particular concerns I have beyond just the challenge of unexplainability, or relating to unexplainability. So what are the challenges of AI? Our story really begins in 1955, when a group of researchers got together to think through how computers could simulate artificial intelligence. They could simulate human intelligence. They called the concept artificial intelligence. It was one of the actual first uses of the phrase. And this conference, this summer conference at Dartmouth in 1955, is considered one of the seminal moments in the history of AI. And here's an example of a neural net, which is what this approach led to. Again, when people talk about AI, when I talk about AI, this is really what I mean. By show of hands, how many people are familiar with neural networks? OK, enough that it might be worth actually going through and explaining in general terms how they work. So in brief, this is a visual depiction of a relatively simple neural network. We have an algorithm that's composed of a series of nodes, represented here in black circles, otherwise known as neurons. And they make weighted decisions, and they pass the results of those decisions on to other neurons throughout the network. And so specifically, the way the weights in those decisions are created is based on training data. And so what you would do is you'd feed a neural network like this some training data, input data along with the resulting conclusions about that data. You would train that network so that has the correct weighting. And then you'd be able to give that network new data that it's never seen before. And it would be able to pretty accurately tell you things about that data. And so, for example, a network like this, once it's fully trained, you'd feed it some data, images, for example. And then certain nodes in that network might get activated, let's say, by the curve in a nostril. And then that network would be able to tell you something about those images, like if one of them had a face, if the network was performing image recognition. And so the first really important point, I think, for folks in this room, technical and non-technical, to take away is that this is a drastic change from traditional programming. Traditional programming is based on giving a computer a series of step-by-step logical instructions. And instead, this is a complete departure. Models like neural networks work by finding patterns in training data and applying those patterns to new data. And these patterns are frequently patterns that human [AUDIO OUT]. And it turns out that this type of programming, based on feeding models to neural nets, is actually beginning to replace traditional programming in a variety of domains. And it seems to me at least like we're on the cusp of this replacement. Different audiences will react to that statement in different ways. But this is a great example from a computer scientist named Jeff Dean. For folks who don't know about him, he's a idol on the computer programming world. If you just Google Jeff Dean and meme, hours will be taken away from your life. And basically, what he's saying, his team at Google is proving that neural networks can be used for basically any task. And so Sam is about to release a wonderful paper that includes folks from that team. And that's on one end of the spectrum where they're using neural networks. But they're also using neural networks for other completely different, like big database indexing problems. They're really using neural networks for everything. And so this quote is where Jeff Dean says, basically, "Neural nets are the best solution for an awful lot of problems and a growing set of problems where we either previously didn't know how to solve the problem"-- my guess is that's going to be a good deal of how neural networks are applied to medicine-- "or"-- back to Jeff Dean-- "we could solve that problem, but now we can solve it better with neural nets." So this comes from an article from last fall on that team. It's a little bit hard to read. But basically, approaches based on data are teaching computers, teaching software programs, how to create their own software. So a few concrete examples of how important this actually is in practice. Again, I think in the medical community, folks understand immediately how powerful some of this is. Other audiences, I think less so. So two examples-- one medical, the other not. This comes from an article in The New York Times not that long ago. And basically, Pittsburgh's hotline for child abuse and neglect is using methods like this to detect children who might have fallen through the cracks. And so in some examples, models are literally being used to find and prevent instances of harm against children. This comes from a team at Stanford. They put together a model called CheXNet. It has some incredibly powerful abilities to detect pneumonia through chest X-rays. And this particular model, I liked that they named it-- they named the actual model. It led to some headlines about AI replacing radiology and radiologists. I think that was a bit misleading. But the broader point here is that we're deluged by data, doctors, social workers, many incredibly important professions. And models like this have an incredibly, incredibly important ability to help us, to help find patterns in that data, to help both augment the work that humans are doing, and in some cases, replace it. So now onto that major problem. And so all of these advances have this incredibly difficult, if not fully impossible problem of explainability. And so this is a cartoon of a police officer asking a driver why he's just been pulled over. And no one knows the answer. And the fact is, it's not actually too hard to imagine a circumstance where something like this might occur. In the medical realm, not too difficult to imagine a diagnosis and a very high level of accuracy where neither the physician nor the patient really have any idea why. And this is actually my favorite article, I think, ever written on the subject. This is from a philosopher. This was published the day after IBM Watson won Jeopardy. And again, the point is that these types of models are not self-aware. Now, Watson, for folks who actually know about how Watson was made, is a bit of a Frankenstein, a Frankenstein system. But the point is that these models aren't self-aware. They don't know what they're doing. We can't really look under the hood and say, why did you make the decision you made? For technical folks who want to talk about explainability, we can dive into that. There's a little bit more nuance there. But the upshot is that they can't exactly answer. And so laws around the world hate this. They hate this type of opacity. To give you a few examples, there's the General Data Protection Regulation for Mark Zuckerberg gave testimony this week. I think less people knew what that law was. Now, more people seem to. It's basically a gigantic data regulation coming out of Europe. Fines for violating it are up to 4% of global revenue, which is insane. So Apple has had revenue of over $200 billion every year for the last few years. If Apple Spain, if a subsidiary of Apple violates this, global Apple could be fined upwards of $8 billion. So quite intense. And the key here, the key connection between GDPR and machine learning is that, more or less, with some exceptions, it basically prohibits automated decision-making of the type we're really talking about today without express human consent. It's going to make using artificial intelligence in practice incredibly difficult. And a lot of this, I think, is geared at the underlying concerns that are ethical concerns. So although folks in the tech community like to scoff at it, and I think for some good reason, I think it's important to really think about the motivations behind it. That's the GDPR. In Congress, a bipartisan bill was proposed at the end of last year. It focused on some of these issues. It was the first federal law, proposed law, ever focused specifically on AI. And then the City of New York itself has stood up a committee as of January to examine some of these issues. And during the Q&A-- we're really just kind of hitting the wave of tops here-- if anyone wants a deeper dive in any of these particular laws, I'm happy to do that. But the point for now is that these are just a few of the growing efforts to regulate AI and to address this new problem, which is really the increasing adoption of AI on the one hand, and the increasing difficulty of understanding it on the other. And a good deal of these approaches actually seek to tackle this problem head on, and in some cases, to mandate certain levels of explainability directly. This quote comes from not too long ago, from the French digital minister. And he basically stated that "Any algorithm that can't be explained can't be used by the" French government. And so again, blanket proposals like this may be well-intentioned. And I think they are. But these types of reactions are going to deprive us from some very significant opportunities if they are actually implemented. The risks here, frankly, are huge. And if we focus too much on explainability, I think we're going to lose some very important opportunities. So that was the first point. The second point is that we've been here before. And as scary and as new as these challenges seem, they're not completely new. We've faced similar challenges in regulating opaque technology, or opaque, or unexplainable software in the past, software systems. And so I want to run through some of these examples and the lessons they teach. And so specifically, I want to talk about three parallels. I want to talk about a law call ECOA. This is used to govern credit decisions. It was passed in the 1970s. I'm going to talk about a law called SR 11-7. This is used in the financial system to govern black box models. When we talk about this subject, I think this is one of the most overlooked regulations out there. And then I'm going to talk about some frameworks for governing our own minds, which are the ultimate black boxes, and how we can learn from some of the legal lessons surrounding liability and humans. So let's start with this. Is the cover of Newsweek. The article is, "Is Privacy Dead." I've blacked out the actual date. And so my question for folks in this room-- if you want to answer, feel free to shout it out. Otherwise, formulate it in your head. My question is, when do you think this article was published? OK, so we've got 1960, 196-- I hear a '70. AUDIENCE: '70s. ANDREW BURT: '70s. We hear an '80, and then a 2000. AUDIENCE: '30s. ANDREW BURT: '30s and '40s. OK, so basically, we've got a span of 100 years. [LAUGHTER] OK. So the answer is 1970. This came about in reaction to the rise of statistical credit rating methods in the financial sector. And this article described that attack as literally a massive flanking attack of computers on modern society. And the idea was that we were all under assault by this new type of intelligence. And it was popular back then to say things like this. This is a senator from that same year. And it was popular to say, we need a regulatory department to be specifically focused on these challenges. So here, the idea was we'd set up a federal department of computers to regulate all software and computing. Instead of this approach-- and there are clear parallels to the way some folks think about AI. For folks who are following the debate about regulating AI, there are people who advocate setting up a federal department of AI. In the 1970s, instead of setting up something like this, Congress actually passed a series of specific laws targeted at specific problems. And so one of those laws was called the Equal Credit Opportunity Act, passed in 1974. The problem it was focused on was that lots of groups faced discrimination in credit scoring decisions. In addition, these algorithms were incredibly complex and difficult to understand, [AUDIO OUT] explainability issues. And so a solution was to mandate a basic level of transparency. Its design was to decrease discrimination on the one hand and increase consumer education on the other. And so as a result of a ECOA, credit applicants would be able to understand why a particular adverse decision was being made. They were entitled to something called a statement, a minimum statement of specific reasons. And that was what you are entitled to see if there is an adverse credit decision that's made. And this quote comes from a Senate report on the bill, basically explaining the importance of the statement of reasons. And this form is actually the sample form included in ECOA's enforcement documents. And this is, in fact, the template for how credit decisions are communicated to this day. If you get an adverse credit decision, it's based on this form. There's a list of potential reasons. And applicants get notified what reason is contributing to a specific result. Now, this type of template doesn't fully break down how a decision is being made. But the statement of specific reasons does give us a basic template to understand what's going on. It makes black boxes, so to speak, just a little bit less black. And so when we think about ECOA, I think the first takeaway is ECOA gets us transparency. So we can see some of how these scoring algorithms are working. And that's important. In fact, I think that's crucial in many ways. But on the other hand, we can't necessarily understand it. So transparency, we're seeing how an algorithm works is not the same thing as explainability. And I think the enforcement documents for ECOA make this pretty clear. The Federal Reserve, which enforces it, has stated that more than four reasons for any one adverse credit score is not actually meaningful. More than four reasons are too many reasons for a human credit applicant to actually understand meaningfully in a way that might be able to change their behavior. And so ECOA might have succeeded in giving some transparency and inserting some transparency into these new and powerful algorithms. But at the same time, ECOA teaches us that even transparency has its limits. Transparency is not explainability. So onto the second regulatory framework. This is called SR 11-7. It stands for Supervisory Guidance on Model Risk Management. Again, this is like the nerds' nerd law for some of these issues. But again, I think it happens to be one of the most overlooked. It is focused specifically on model risk. It came about after the 2008 recession. And this is really when regulators around the world-- this is an American regulation, but there is some equivalence in the EU-- regulators around the world started to notice banks using more complex algorithms. And they started to notice that, as a result, banks had less of an understanding of how and why they were making particular decisions. So this is enforced by the Federal Reserve and a key regulator within the Department of Treasury. So this statement comes from the regulation. It's basically an acknowledgment of the fact that banks are relying on more and more complex algorithms, some of which might fall into the category of AI, and that banks and financial institutions are using these type of algorithms for a wide variety of reasons. And then the regulation makes this direct, very nuanced admission of the costs of this trend. So I want to read this quote. The regulation states that "models also come with costs. There is the direct cost of devoting resources to develop and implement models properly." That's intuitive. "There's also the indirect cost of relying on models such as possible adverse consequences of decisions based on models that are incorrect or misused." And so I think this is really the meat of what we're talking about when we talk about regulating AI, of controlling models whose interworkings we can't fully understand. And so the regulation identifies two major risks. First is errors made by the models themselves. For this community, this is a false negative. This is a diagnosis that shouldn't have been made. This is fairly intuitive. The second risk is, broadly, misuse. And so models, due to their inherent opacity, can be used for purposes other than their original intentions very, very easily. And the regulation makes this incredibly important assertion that, again, I want to read to you. It states that-- I have that here in parentheses. It states that "Models, by their very nature, are simplifications of reality, and real-world events may prove those simplifications inappropriate." And I think this admission is one of the most important admissions in understanding AI. And there is a famous quote I'm sure a lot of folks here are familiar with from the statistician George Box that "All models are wrong, but some are useful." And what this means is that every model is based on correlations and data, but not causations. And so those correlations might be useful to us. They might tell us about the likelihood of a particular answer being correct. But that's not a substitute for actual reasoning. It's not a substitute for actual intelligence. These models do not know what they're doing, just like Watson didn't know it won Jeopardy. In fact, I wonder, actually, whether Watson knows it won Jeopardy right now. I suspect it does, but you never know. So what does SR 11-7 say we should do to fix some of these problems? The solution lies in a concept that calls "effective challenge." And this is really the central thesis of the regulation. What effective challenge means is critical analysis of really every step of the process from creation, to testing and validation, to deployment of a model. It means outlining your assumptions. It means questioning those assumptions and more. And so the regulation has very, very specific guidance about how to carry out all of these procedures in practice. And in the national security world, we might actually call a concept like this red teaming. And red teaming is basically a process. It's something you do in a world with incomplete knowledge and incomplete facts, something you do to address uncertainty. I'm going to revisit the importance of effective challenge shortly. But before I do, I want to get to that last legal framework, which is the one governing our own minds. And so to do that, I thought I'd introduce you to FloridaMan. So for people who don't know about FloridaMan, he's been dubbed "America's worst superhero." This is an article from The New York Times highlighting his ascendance as a meme on Twitter from a few years ago. And the basic idea is that, for whatever reason, the beaches or the weather, men in Florida seem to be generating newspaper headlines that are outlandish. And as a result, the more generic FloridaMan has become a fictional superhero on social media. So to introduce you to FloridaMan, I thought I'd give you a few examples of some of my favorite headlines. So this is Florida man tossing a gator through a window. This window happened to be the drive-through at a Wendy's. The gator happened to be very much living. This is FloridaMan calling 911 repeatedly because the clams he ordered at a restaurant were "extremely so small." In this case, Florida man actually got arrested on misdemeanor charges for calling 911 multiple times. And so I scoured the internet for Florida man headlines. And this is actually my all-time favorite. So full description is a bit of a mouthful. So Florida man, at the age of 82 years old, is arrested for slashing the tires of an 88-year-old woman with an ice pick during a bingo dispute. Don't ask me why Florida man had an ice pick, given the weather. But if these aren't examples of completely unexplainable, unpredictable, outlandish decision-making, I don't think those examples exist. And indeed, the parallels between our own minds and machine learning are actually quite strong. From the time we're babies, we ingest new data on a daily basis-- images, sounds, sensations-- and then we make conclusions about correlations in that data. That's how learning works. But there are a number of problems with this model, highlighted by FloridaMan's incredibly bizarre, unpredictable behavior. And so the question is how does our legal system handle this? How do we think about regulating FloridaMan's behavior, behavior we can't come close to explaining. And there are two real answers to that. The first is that we treat human decision-making in different stages according to age. And so the first thing we do is we ask if FloridaMan is making his own decisions. For a certain age, basically, FloridaMan is on the hook-- FloridaMan's parents, I should say, is on the hook. There's an age where children become responsible for their actions, usually in the double digits. It varies by state. I don't actually know what that is in Illinois. Some parents here might be anxiously awaiting that date. Then there's some intermediate stage, where children take partial responsibility for their actions. This is the status of being a minor. And then eventually, there's adulthood. And this is when FloridaMan becomes an adult. In the legal sense of the term, he's entirely liable for what he does. And so the key point here is that we can extend this approach to thinking about models like neural nets, classifying them in terms of their maturity. Certain models might actually need to reach a certain level of maturity before they can be deployed in certain circumstances. But we don't let FloridaMan drive, for example, until he's reached a certain level of maturity, until he has processed a certain amount of input data. And I think the same is really going to need to be done with AI. We use age as a proxy for training data. But even once humans are adults, our brains still don't process information completely rationally. They're still full of cognitive biases, like Hans highlighted. And so the second question is, how does the law deal with our own inability to explain decisions, our own decisions, even as adults? And so the answer there is a standard called the reasonable person standard. It's used really, really widely throughout different areas of the law. And this slide comes from a great law review article on that standard. And basically, that standard places judges and juries in the position of saying, given all the data that the person had at the time, given all of the context, was what this person did the right thing? Was what they did reasonable? Now, it's an incredibly subjective standard, and it can evolve over time. But subjective standards need not be perfect, and they can be incredibly useful when engaging with things we don't fully understand. So why is this so important when we think about regulating AI? A few reasons-- first, the way we think about FloridaMan learning and gaining responsibility as he becomes an adult, I think, is something crucial, the crucial lesson for us when we think about regulating and managing the risks in AI. And secondly, I really think it's worth drawing out the point that in the law, we are using age as a proxy for input data. As children grow older, they have more input data. And maturity of training data, I think, is really going to be a central focus. There's a great Rand study in a different area, outside of medicine, on self-driving cars. And that study was focused on, basically, how many miles of training data are autonomous vehicles going to need before we can start certifying them as safe? And so I think this is really a key point when we think about controlling risk and deploying AI effectively. And then, lastly, I think it's very, very important to highlight the role that subjective standards have to play when we think about governing unexplainable decisions. And so specifically, my real point here is that we need new standards. We need common standards. It's subjective, standards that might evolve. But we need these standards to help us evaluate how machine learning systems are being trained, and deployed, and maintained in the real world. And right now, I haven't seen, frankly, any examples of common standards that exist for the world of data science. So I'll revisit that shortly. Very quickly, summing up, from ECOA, we can learn that we might need to mandate a certain level of transparency at times. At the same time, transparency and explainability are very much not the same thing. From SR 11-7, we can learn that even when there's no explainability, there's still a host of ways of controlling models. This is effective challenge. And from the way that law treats human minds, we can learn the importance of maturity and subjective standards of reasonableness. So that was the second point. And lastly, I want to get a bit more specific. I want to focus not simply on what we've already learned, on what laws already exist about governing [AUDIO OUT],, but I want to talk about a little bit beyond the challenge of unexplainability alone. And I want to talk about how we should respond. Because again, the Jeff Deans of the world are starting to use neural nets for basically everything. And the question is, what do we do as a result? As governments, as health care providers, as people, we're seriously worried about the risks of all of these approaches? So I'm going to start with my most general point. And then I'm going to talk about three sub-points. My first point is that AI should not be regulated in one place, through one regulation. We should not stand up the Federal Department of AI today, just like we should not have stood up the Federal Department of Computers in the 1960s. We're going to frankly need a host of different regulations and different approaches. And so a few examples targeted towards the medical community, just for this talk-- I think this is going to translate into a few different areas. So one is, I think there need to be specific data sharing regulations around medical data, beyond just HIPAA. I think HIPAA is woefully underprepared for the type of data sharing and really, the scale of data sharing that we're going to need to train some of these models and deploy them, if we're really going to make use of AI in medical environments. I think it's going to translate into specific types of regulatory review for machine learning models that are being used in diagnostic settings. There need to be specific transparency and third-party auditing requirements for some of these models, placed on vendors, third-party vendors, or hospitals that rely on these models so patients can understand what's going on, and so third parties can actually properly assure that they've been validated in the right way. So this is just a few examples of potential areas that I think need to be applied to the medical community. And this slide is from an op ed I wrote earlier this year in The New York Times basically making the same point in that I think it's a very bad idea to think about responding to AI and the challenge it's creating with one single response, with one regulatory silver bullet, so to speak. So beyond that general point, I want to get into specifics. And so what I want to do is I want to talk about three, frankly, of my own personal greatest concerns when we think about the risks posed by AI. I'm going to talk about those challenges. And then I'm going to try to actually constructively suggest how to solve them. So to start with is the issue of liability. Right now, I think it is just simply not clear exactly how, and why, and where a deployed model holds its creators liable. And I don't think we're going to be able to safely deploy these models if we, if data scientists, don't actually know where that line is. I think it needs to be crystal clear from the outset exactly where this liability lays. And so in medical environments, I think we're looking at a future-- excuse me-- where models created and trained by third parties are increasingly used by physicians. And again, I don't think it's clear enough where liability lies. And so in many cases, for example, these models will be more reliable, they'll be more accurate statistically than human physicians. And so is the burden then going to be on physicians, on health care providers, to default to the most accurate solution, even if they don't understand it, even if they can't even come close to understanding the technical reasons behind how the model is working, or where the data came from? And what if these models then make an error, a false positive, a false negative? Who's responsible? Let's make things a little bit more complicated. Let's say a model trains continuously during deployment. So the model is reshaping itself based on the data it's actually being exposed to. Who is responsible in that circumstances? Is it the creators of the model? Is it the people whose data the models are reacting to? These are all really big questions. I don't profess to have the answers. I have some suggestions, which we can talk about a little later. But at least, I would say, at least the basic framework for how liability exists in practice needs to be clear before, I think, we can start using some of these advances in the areas, frankly, where, I think, they might have the biggest impact. So the second biggest concern for me is this big bulky word, interrogatability. It comes from a friend, Dan Geer. And to me, it means a couple of different things. So the first thing it means is explainability or interpretability. And this is largely in the way I've been talking about. So do we know what caused this outcome? Can we create a causal explanation for why a specific input data created a specific output data or decision? So a bit more background here. This comes from DARPA's Explainable AI Project. And what this graph is saying is really that different models make different trade-offs in terms of accuracy versus explainability. And so on the x-axis, I believe, yes, we have explainability. On the y-axis, we have the level of accuracy. And the key takeaway is that different models have differing levels of both. And so the level of explainability is always going to be the result of a trade-off. Explainability is not simply black and white. And the fact is-- as I'll talk about shortly-- there are different ways we can make this trade-off. And that fact, that trade-off, that optionality, so to speak, needs to be clear when we start building these models. Again, though, there's more to interrogatability than just this trade-off alone. There's a more, I think, human, more procedural side to this problem. And this relates to who we can ask, who we can interrogate if something goes wrong, if we need to get an accounting for any specific model output. And so this is the beginning, or the cover page, from one of my favorite papers ever written about machine learning. And it talks about the concept of technical debt applied to this realm. So in software development, the idea of tech debt comes from basically prioritizing deployment, getting your software to market over sustainability. And so tech debt is something that gets progressively worse over time. You're basically mortgaging complexity, which gets worse. And in machine learning, tech debt is similar, but I think it's deeply challenging and deeply vexing in a variety of different ways. This paper goes into some of those ways. But for us, I think, the main point is that machine learning is deployed in incredibly complex environments. And because of that, these models can be dependent on data that we don't fully realize. And it can make these models react in strange ways or unpredictable ways. And so this fact, the type of tech debt that accrues in machine learning environments and make it very difficult to figure out why a specific outcome happens, this interrogatability problem, this inability to interrogate and fully account for a particular decision, I think, is really, really greatly influenced by tech debt in machine learning systems. And then here's the third and final challenge. In fact, I think this is actually-- I would rank this, I think, as the biggest long-term challenge I'm worried about when I think about deploying AI. I have this here as fail silence, more frequently known as silent failures. And the fact is, I think we often won't know what counts as a failure once we've deployed a model. And even if we do, I think, oftentimes, we won't be able to understand exactly why that failure has occurred. And so I think, frankly, we're looking at a world where we might be lucky to know if something has actually gone wrong. There's a lot, obviously, to say on this topic. One of my favorite examples of this, though, is Move 37. Move 37 took place in the second Go game between AlphaGo, the series of ensemble methods that was used to beat human experts in Go. AlphaGo, in this case, it was Game 2 between AlphaGo and Lee Sedol. And so Go is supposed to be one of the most sophisticated games humans have ever invented. And Alphago basically wiped the floor with our best Go mind. And Move 37 is particularly powerful. This is a move that AlphaGo made. And nobody understood it. And it was completely, completely bizarre. And as a testament to how unexpected it was, Lee Sedol was so angry. He was so flummoxed by the move, he reportedly had to stand up and leave the room. And it took him 15 minutes to recover from that move. And at the time, people thought that this was a bug, as models, you know, are prone to do. It turns out, over time, we now understand this was actually a feature of genius of this move that humans just did not understand. And so understanding what's a failure, and understanding what is not a failure, and keeping track of this difference in ways that are meaningful I really think is going to be one of the biggest difficulties brought about in practice by the deployment of AI over the long term. So OK, those were my three biggest concerns. Those were three areas, which some alarmists might digest and say, OK, we just can't do this in risky environments, which is not what I am trying to do. That's not the goal of my talk here. So I promised that I would actually have some constructive suggestions going forward. And so that's what I want to outline here. So in general , I think this point is pretty clear. We just need clear liability. We need it from a regulatory perspective. We need it from a development perspective. Everybody needs to understand where the lines are. The lines to start out with don't have to be perfect. But they need to be clear if we're going to move forward. Secondly, that trade-off between explainability and accuracy, again, needs to be clear. And it needs to be documented. And it needs to be the result of a conscious decision. Now, this is something that I have learned a lot dealing with engineers. But quite frequently, in engineering, we default to the most accurate solution. The ultimate goal is accuracy. And in many environments, especially in medical environments, that can't be the case. We need to be thinking consciously about what accuracy we're gaining and what explainability we're losing when we make these decisions. And there are a variety of different ways that we can balance that trade-off. There are a variety of different ways we can cut specific decisions into smaller decisions to help us make that right balance. But again, we need to very consciously understand what decisions we're making and the implications of all of those decisions. And then, lastly, we need to be thinking about what counts as failure. And we need to be extremely creative about how we monitor, and alert, and intervene with potential failures. And so just a few examples of what that's actually like. This is something I'm quite focused on in my day job. So some of this, I think, is going to include best practices, like constantly snapshotting input and output data, comparing these snapshots against benchmarks or statistical ground truths for how we think data, input data, so the world, or output data, the decisions, should be behaving in practice. And that means very consciously thinking about how to insert humans into the loop when we think there are potential deviations or anomalous activities. I also want to make the point that all of the suggestions I've just made are going to be in a white paper we're going to release in the next-- I don't know exactly when. My guess is probably two months. So for anyone who's hungry about more specific details of putting these recommendations into practice, I'll have my contact info on the last slide here. And just reach out to me, and I'm happy to make sure you get the paper. Our goal is, really-- I said there's no reasonable standard for deploying machine learning, or controlling risk in machine learning. And our goal is to at least get the ball rolling in creating version one of something that could turn into that standard. So all of that brings me back to Hans, the horse. And so I wish I could tell you that Hans had a happy ending, that after the world learned he wasn't as intelligent as he first seemed, he still had a long and distinguished career. But that is emphatically not the case. At the beginning of World War I, in 1914, Hans was actually drafted as a military horse by the Germans. He's believed to have been killed in action, or eaten by hungry soldiers, some time in 1916, neither outcome, obviously, ideal. But here's an important aspect to Hans's story that I haven't actually spent time talking about today in that there was really a 10-year period where the best scientists in the world thought that Hans was the real deal. They thought they'd found a new source of human intelligence. In 1904, seven years before his intelligence was officially debunked by Oskar Pfungst, the German board of education set up a commission to study his intelligence. And after a year-and-a-half of study, 18 months, they concluded that it was the real deal. And the challenges Hans posed really mirror the challenges we face today with AI. For example, how do we approach a new type of intelligence we can't understand? How do we harness it without stifling its potential? Should we harness it at all? How do we understand when it's wrong? How do we hold it to account when it creates a negative circumstance? How do we control the unexplainable? So the parallels between Hans and AI, of course, only go so far. What we call AI today really is quite capable. New models really can achieve new levels of pattern recognition that humans simply can't. We are looking at a breakthrough. And that's to say the technology is ready, and it's ready right now. But what isn't ready, as I hope I've convinced folks here today, is the law. The laws in place governing AI are not ready. We don't yet have any agreed upon practical methods for deploying these types of models in real-world, important, and potentially sensitive scenarios. We have frameworks we can draw from, as I've tried to show today. But we don't have any clear legal response to the rise of AI in all the areas it's being deployed. And so when you think about the success of AI, I would actually ask that you think about the laws governing AI instead. I'd ask that you think about this gargantuan task of regulating AI, which is going to shape the benefits we can draw from this technology as individuals, organizations, as health care providers, as patients around the world. Because AI is becoming ready and is becoming ready to be used in myriad environments. And what's not ready is our laws. The good news is the way our laws respond is up to us. So on that note, I think Sam and I are going to talk. And I'm happy to answer any questions you have. [APPLAUSE] SAMUEL L. VOICHENBOUM: I was interested in your topic about liability and silent failure. And I was wondering if you could give a very practical example. You know, we're using algorithms to detect cardiac arrest in the hospital. And the algorithm group that [INAUDIBLE] runs, for instance, has developed an algorithm to detect when patients are going to have a cardiac arrest. And there's a pager that goes off. Everybody runs to the room. And so that algorithm is very rules based. And as these algorithms mature and become more deep-learning based, how do you see those issues of silent failure and liability taking shape around those kinds of specific examples? ANDREW BURT: So there's a liability answer, and then there's a silent failure answer. The silent failure answer is one that I think is easier to talk about, frankly, because that point relates to deploying a model like that over time. And so at first, it might be intuitive. You're going to know when it fails because it makes this [AUDIO OUT]. I think a silent failure is when the input data changes over time such that it's making correct decisions, but for reasons that don't make full sense until suddenly they don't. And then there's a change. Suddenly what has been working for a while no longer works. And no one understands exactly why. So maybe in that case, the silent part of the failure is that the model is actually working, but it's working in ways that are pushing it towards failure. And then once it fails, debugging it is going to become incredibly complex, if not impossible. So outside the medical world, there's an example of debugging and really ethical liability. And folks here might have heard of it. Google has an image classifier. And in, I believe, it was 2013, it started classifying African-Americans as gorillas. Do folks know about this? OK. So it started doing that. Obviously, a huge problem. The engineers had no idea it was going to do this when they deployed the algorithm. And there is a Wired story from January of this year, which says they still, after all these years, have not been able to debug and figure out exactly why. And so their answer right now is, basically, to not allow the label "gorilla" at all in this image classifier. And so this debugging issue, I think, the confronting failures is going to be huge and incredibly, incredibly difficult. And it's just going to be an incredibly difficult challenge we're going to have to figure out. But I realize I'm talking about one of those questions. And the liability question-- SAMUEL L. VOICHENBOUM: Well, I guess, will the failures always be obvious? Because I was thinking about your Move 37 issue. And so if the algorithm says, patient has cancer, use this kind of chemotherapy, and no one's ever thought of that before, and you use it, I mean, are you going to know that that was a Move 37 or that's a mistake? ANDREW BURT: I think right now, you're not. And that's why understanding when human review comes in is going to be incredibly important. I don't know-- excuse me-- the right way to deal with those. I suspect that what you do in a medical context is if there is a Move 37 and it's the first Move 37, you have an alert, this is anomalous activity, don't do it. In the medical world-- One of the reasons why I actually think-- so one of the reasons why AI and data science itself is so fascinating is because it can be employed in almost every context. And so there are some interesting articles on the death of expertise and the rise of data science. So it's fascinating. But I think someone like me, who is focused on risk, I think there's no better environment, or more conscientious environment, than in the medical environment, where no one understands risk, I think, like physicians understand it, though my gut for an answer like that is what you would do is you would say no, no to Move 37 until it happens enough times and there's enough human review that we can somehow validate that there is some genius there. SAMUEL L. VOICHENBOUM: So one of the most popular questions right now on that line is that you think physicians will be found liable for not using the best available model. ANDREW BURT: So there's a great paper that was just published on this. And I think right now, legally, the answer is yes. I don't know if that's going to change. But I think the way that legal liability works right now is that physicians are held liable if they are not using the methods that are most likely, trustworthy, but the best practices, [INAUDIBLE] best practices. SAMUEL L. VOICHENBOUM: But those are transparent models, where we know with evidence-based medicine why they are the best. So I guess the question is, when you start to have opaque models that have been shown over time to make the best decision, but then a Move 37 comes up and the physician doesn't do it, will there be a liability there? ANDREW BURT: So right. So I don't know. I mean, these are questions that are incredibly important. I don't think they have clear answers. My sense is what's clear is that physicians are legally liable to be using the most trustworthy, accurate methods. So frankly, I think the answer that folks in the technical community and the data science community, the answer I hear the most from them is the same answer you get with self-driving cars, where one in however many self-driving cars drives off a bridge and does something crazy, and the answer is, that's the cost of doing business on the one hand. And in fact, you've seen some of this with the recent incidents with Uber. There have been a few incidents in the last few weeks that have brought this to light. And so people say, on the one hand-- I think Tesla's statement, actually, in reaction to one of the Tesla crashes, was, we're sorry for the loss of life this caused, but overall, from a utilitarian perspective, we're going to be saving more lives by relying on things that don't make this type of mistake. I can't say I'm comfortable with that. I don't know. I don't know if that level of discomfort is just something we need to accept. But I think that's a huge, huge question. And it's potentially one of the trade-offs. And so what I asserted today, and what I am 100% comfortable with is silent failures, and the need to insert human review, and do some of these types of anomaly detection, where we at least know Move 37 is occurring. That needs to happen. How exactly we should respond, I don't know. I think there are arguments to be made that there's going to be some collateral damage. And over the long run, that collateral damage is going to be less harmful to society as a whole than if we just let humans make mistakes like they do now. I don't know. SAMUEL L. VOICHENBOUM: But that's not really how we make our decisions. We were just talking about this in our ethics class. You can't design a clinical trial and say, well, 5% of the people are going to die from this therapy, but we hope to learn from it anyways. You have to have a reasonable expectation based on the Helsinki criteria that what you're testing is not going to be more harmful. So that's a different test of this scenario. ANDREW BURT: I'm not so sure. So I was thinking-- so in constructing this talk, I was thinking, what would it look like if I came and threw out all of these slides, and just totally focused on learning and medicine, just like my thoughts on machine learning and medicine? And the first thing I thought of was, what's the future of the Hippocratic oath? Will doctors really be able to say in every instance, we're not going to cause harm for precisely this reason? And then on second thought, I think doctors do this with medication every day. SAMUEL L. VOICHENBOUM: It's a balance. ANDREW BURT: Yeah. The fact is, it's a balance. There are a lot of medications that are prescribed widely that nobody knows how they work. And sometimes people die. By and large, these medications seem to work pretty well. And we just accept some people dying as a cost of doing business. And so I think that might be a more analogous scenario than some of these clinical trials. But I think that the end statement is, it's going to be a balance. And I don't know if we're going to be comfortable with that balance. But we need to think through the balance. Because again, these tools are being developed, and they're being employed. And as you know, they can be incredibly effective in ways that humans [AUDIO OUT] can't. SAMUEL L. VOICHENBOUM: Somebody asks, a couple of people ask, is the regulation of artificial intelligence different if we must consider the impending artificial generalized intelligence that everybody's worried about? ANDREW BURT: Thank you, Mr. Musk, for asking that. I brought up Hans for a couple of reasons. One of them is cognitive biases. I know a lot of people are very worried about this idea of artificial general intelligence. To those people, I would say, spend more time thinking about how IT systems fail for really dumb reasons and then come back to me and talk. I don't want to minimize that there's a lot of fear out there. And I think there's a lot of misunderstanding. But I think the idea-- I don't know. I think this point, one of the reasons why I think it's a bit distracting for what Elon Musk is doing publicly, but this point is something I can do a little bit of a deeper dive into. But it's more of-- should I do a deeper dive into why people think Skynet is going to kill us? Or should we just move on to-- SAMUEL L. VOICHENBOUM: I mean, I think we're all worried about the impending singularity. ANDREW BURT: Yeah. SAMUEL L. VOICHENBOUM: I am. ANDREW BURT: You're worried? OK, OK, so basically, so this question comes from this cognitive bias, which is that humans can't understand, or can't grasp exponential change. And so when you look at the growth of computing and the ability of computers to simulate human intelligence, what you see is a clear exponential growth curve. And so from that one statement, we then get concerns that, well, if this is true, then in 10 or 20 years, we're going to have a Skynet that's artificially intelligent and that's going to be able to maximize its own ability for human survival. And that assumption is then going to lead it to kill us. SAMUEL L. VOICHENBOUM: To be fair, the alarmists make the point that from the time it happens to the time we realize it's going to be very short. ANDREW BURT: Yes. But that's still based on this assumption that we are bad at understanding exponential change. Therefore, there's going to be some godlike intelligence that exists. And I think it's a distraction. I think there are other problems we need to be focused on right now, today. And from the world that I live in, where I'm seeing real risk everyday and real potential harms, I think it's a total distraction to think about how computers are going to kill us, just like I think it was a distraction the 1970s to think we're being attacked by computers, which was kind of true. There's some truth in these worries. But the worry in the 1970s that we're being attacked by computers was, OK, how is it that we make them more useful? How do we control the ways they're being applied? How do we, for example, pass laws like ECOA to try to tackle discrimination? Not how do we stop computers from taking over life on Earth? SAMUEL L. VOICHENBOUM: So to that point, though, one of the biggest concerns people are asking questions about is, how do we control, or protect against discrimination that these algorithms will likely make if they're taking available input data and then making unbiased decisions? And so looking at socioeconomic status, or loan applicability, how do we regulate that, or how do we prevent that? ANDREW BURT: So that's a really complex question. I think there are going to be a bunch of answers. It is a basic fact that under-privileged and under-represented communities don't generate as much data. And AI is based on data. And so one, we need to be understanding how all data is biased. We need to try to be quantifying the way that data is biased. And I think we need to be thinking about more creative ways to try to level the bias. Because right now, there's a good example of this. Either the city of Boston, or a nonprofit, or something in Boston created an app that was designed to detect potholes based on one of the sensors in smartphones. And surprise, surprise, the wealthiest communities had the most smartphones. So as a result, to start with, all of these potholes were getting fixed in wealthy neighborhoods, when that wasn't the intention. And so once the developers realized that, they were able to insert some fixes in there that helped minimize that bias. But ideally, everyone from every community would be generating the same type, quantity and quality of data. And I think that would reduce it greatly. I don't know if that's realistic. So in the interim, I think we just need to be aware of bias in all the areas we can. SAMUEL L. VOICHENBOUM: Right. And that's not a new problem. I mean, when we look at clinical trials, and who is on clinical trials, and the data we collect, it's not always a fair sampling of [AUDIO OUT] taking the medication or using the intervention. But sticking with this theme of nefarious intervention, I saw a really cool example. I can't remember the exact example. But it was where an AI would identify a picture. And then somebody put in some noise into the picture. You couldn't even tell. But then when they ran the same AI over the picture, it found something completely different. ANDREW BURT: Yeah, we were actually just talking about that. SAMUEL L. VOICHENBOUM: So if it's so easy to fool the human eye or to mess with the data so that you get a different result, how will we protect against those types of-- somebody used the term Trojan horse. But how do we prevent against that type of intervention? ANDREW BURT: Caveat, I'm not sure I'm qualified to answer that particular question. I think the answer that I see people default to for questions like that is kind of fight bad technology with more good technology. And so I think people are thinking about, how can we create other AI that will detect that, and will try to figure out if the system's being gamed. I don't know. I don't really think-- I mean, I came from the world of information security before I started doing what I'm doing now. I don't think there are real examples of, in practice, people, at least outside of the world of academia and research, people trying to game systems like that. But the fact is, I don't know. I think right now, we're on the forefront of deploying AI. And then I think assessing the threat environment is going to be something new and different. SAMUEL L. VOICHENBOUM: Right. ANDREW BURT: So I don't know. SAMUEL L. VOICHENBOUM: The example I always show about the ethical issues of AI is always that famous picture of the car veering off the road to kill its driver to save a group of people. And so you can imagine that the richer you are, the more likely you are to have an algorithm that would save you rather than the group of people. Again, there's going to be this disparity. ANDREW BURT: Yeah, so in philosophy, that's called the trolley problem, which is like, a trolley is hurtling down. And how do you prioritize the lives it should save? That's also a question I don't know the answer to. So I would make two comments on that. One, I don't think the choice the algorithms are going to make are going to be necessarily based on consumer behavior. I mean, this will differ by country. But at least in the United States, Canada, and Europe, I'm not sure you'll be able to buy a car that says, you're the most important car, you're the most important person in the world, and you know, I'll kill a hundred people before I kill you. And I love you. Aren't you so great? Whatever. But I know that some lawmakers in Germany have actually tried to tackle this problem. And so I believe there's actually a law on the books in Germany that says, you need to prioritize human lives based on number rather than occupancy. So the Europeans love to regulate technology-- apologies to any Europeans watching or in the room-- I think, before they fully thought about that technology. So I don't know how that works in practice. I don't know how that works in practice. I don't know if there's going to be an algorithm in every Tesla in Germany that is scanning for human faces and then determining when it could possibly hit one. And if it is going to hit one, then it finds something that's not a human face and accelerates into that. Like, I don't know how that's going to work in practice. But people are focused on it. Some people are legislating about it already. SAMUEL L. VOICHENBOUM: I'm not quite sure I understand this question. But it sounds cool, so I'll ask it. It's the reasonable man test is based on the collective morals and ethics of people which evolve over time. So should AI be judged by a collective of its peers? ANDREW BURT: So that's a really interesting question. I thought about that. When I talk about the reasonable person standard-- also good on whoever asked that for having a good understanding of both how the reasonable person standard is used and how could be placed in the world of AI. So the most frequent comment I get when I talk about these standards is, how are we going to apply a standard that we can't-- if we can't understand these algorithms, how are we going to apply a standard to them? So when I talk about the reasonable person standard, I'm really talking about just having an agreed upon standard for the humans that actually deploy these, create, deploy, validate, train, et cetera, all of these algorithms. So that's what I'm focused on. It's not about judging the inner workings of the models. But that said, I am not a person who thinks it's bad to approach problems with technology with more technology. And so I would actually be really interested to see what that would look like. I don't know what that would look like. I mean, the very idea of having AI judged by its peers seems preposterous. But I think there could be some value in again, trying to think about maturity and levels of output data in particular models, and using that output data to train other models to then assess the reasonableness of that output, if folks are following me. And so I'm open to it. I see no reason why we shouldn't try. I see lots of reasons why it might not work. But I'd say go for it. If whoever asked that question wants to go build something like that, you have my email. Tell me how it works. [LAUGHTER] SAMUEL L. VOICHENBOUM: I was at an AI talk yesterday. And somebody asked the question, are you worried about AI replacing you as a physician? And the person answering the question said, I'm not worried about that because there'll always need to be a human in the loop for making a decision. And I thought that is exactly not how I'm thinking about this. And I'm thinking that there's this spectrum of where the decision is made being continually pushed upstream. And the question is, where does that become too uncomfortable? Because right now, for instance, you could have an AI that tells you not to give a certain medicine. And so an alert would pop up and say, this patient shouldn't get this medicine. The next step would be having the AI refuse to give the medicine. And to me, that feels uncomfortable. But I bet in a few years, that won't feel uncomfortable. And so what is it about our accepting these new technologies that lets this evolve over time? Because as you said, or you alluded to, in fact, we can't imagine what it's going to be like even in five years. So I think the landscape is going to change very quickly. ANDREW BURT: Yeah. So I think that that question is, are we looking at a world with augmented intelligence? Or are we looking at a world with artificial intelligence? And to what extent are these developments going to replace human decision-making? And I think there's going to be a spectrum. And I think there's going to be a spectrum. In the medical community, I don't think it's fair-- I don't think there's going to be one monolithic impact on the medical community. I think there are going to be jobs that can be a little bit more routinized. I think that cheXNet got a lot of attention. And I think radiology might be an area where it might have a bigger impact. I don't think it's going to replace radiologists, but it might make radiologists more efficient. My guess is it will reduce the need for as many radiologists, and it will require that radiologists that are currently practicing probably become more technical. It'll change the expertise level. And so I don't know. I mean, image assessment is one thing. And I think it's going to depend. It'll depend. SAMUEL L. VOICHENBOUM: So I think we're falling into the same trap because I think it's fine to imagine that your AI will tell you when there is metastasis to the lung. And then a human will look at that and say, yes, I agree with AI. There's a met there. And I think we all would say, I'm uncomfortable with not having that human there, with having the machine just report that and have action taken on it. But I can guarantee there's going to come a point-- who knows when that is-- when that's exactly what's going to happen. ANDREW BURT: Yes. If you folks can see, there's a sticker on my computer that says, "depends." And that's because that's all that lawyers say, which is, it depends, when they tell you you [AUDIO OUT] it depends. So for that, I won't say you owe me money. But I will say, I think it depends. And the reason why I think it depends is because I think there are some areas right now where I would be comfortable having a machine learning model make a diagnosis about me and then actually prescribe a prescription. And there are other areas where I would be wildly uncomfortable. And I think all of that really depends on-- I think it depends on the maturity of the model. Probably the more serious examples, for example, the predicting mortality rate upon admission, I would want a human review there. So there's a scale. And the scale probably is, how mature is the training data the model is based on? And how well refined has it been? There all these different methods, the ROC curves, et cetera, all these different methods to assess the accuracy of a model. So that would be one. And then the other is, like, seriousness of something going wrong. So if the likelihood of something going wrong is, a recommendation for a topical allergy ointment, and I get a rash, I think I'd be pretty comfortable doing that now. And so I think it depends. SAMUEL L. VOICHENBOUM: I would also argue that, for somebody with, say, refractory cancer, you're looking for that Move 37. And so who knows how we'll approach that? I just think it's a really cool-- I think we're in for some really cool rides coming up, both in our autonomous cars, as well as our medicine. ANDREW BURT: Yeah. Yeah. I mean, for that, it might be, like, how hopeless is the current situation? And the more hopeless it is-- or what is the risk tolerance of the patient? And that might be one axis that you'd want to plot. But I agree. I mean, I'm here. I'm thinking about this. I'm working on it because I think it's fascinating and important. And I really don't think there's more of a fascinating area than applied machine learning and the world of medicine because it's so clearly impactful. And the impact matters, and the risk matters. And so I think of every area, honestly, more than self-driving cars, I think the folks in this community, I think, everyone else is going to be learning from. SAMUEL L. VOICHENBOUM: You're just saying that. At the Uber convention, you'll say something else. ANDREW BURT: Yeah, that's right. Yeah, yeah, yeah. SAMUEL L. VOICHENBOUM: All right, well, thanks so much. This was fascinating. It was a great talk. Good for you. ANDREW BURT: Thank you. Thanks, everyone. [APPLAUSE]