Tip:
Highlight text to annotate it
X
[APPLAUSE]
SAMUEL L. VOICHENBOUM: Thanks everybody for coming.
This is really exciting-- a really exciting program.
It's being sponsored by the Graham School and three
of our master's programs.
We have a master in biomedical informatics,
a master in data analytics, and a master in threat and response
management.
And these programs are all geared
in some way toward analysis of data,
and using machine learning, and using analytics.
And part of the art of the thrust for all the programs
is to understand how to use data in an ethical way
and how to use these algorithms that we're
developing in a thoughtful way.
So I met Andrew a couple of years
ago at South by Southwest through a mutual friend.
We were both giving talks that year.
And we just had this amazing conversation
talking about applications of machine learning.
So as a physician, we're seeing the use of machine
learning algorithms all over the hospital and all over medicine
from things like predicting cardiac arrest,
to predicting sepsis, to predicting
patients that are going to be readmitted to the hospital.
And we just plow ahead with developing our models
and developing our predictions.
And it wasn't until I spoke to Andrew that I really
stopped and thought about the implications
of these algorithms and how they could
be used in both good, but in also bad ways.
And it was really an eye-opening experience.
And ever since that meeting, I've
been really excited about bringing Andrew here to talk.
So Andrew is a fascinating guy.
He used to work for the FBI cyber division.
He's the Chief Privacy Officer at Immuta,
which is a data science company and just does
really fascinating work, and just is very, very tuned in,
both to the technical analytic side
as well as to the legal and ethical implications
of the work that we do.
So the title of Andrew's talk is,
Regulating Artificial Intelligence, How
to Control the Unexplainable.
And as you listen Andrew, I want you
to keep in mind not only the science and computer
science of what we do, but also the social implications.
And I hope that the questions that we talk about,
that we discuss afterwards will run
the whole spectrum of the implications
of this kind of technology.
So with that, Andrew, excited to hear your talk.
And take it away.
ANDREW BURT: Wonderful.
Thank you so much.
Let me just switch here.
And while I am switching, I will say thanks to everyone
for coming out.
Thank you, Sam, Wendy, Suzanne, everyone.
I wanted to start today--
I should also say this is a condensed version of a longer
talk.
And so I want folks here to keep me honest and try to keep this
a little casual.
So I'm going to do my best not to go off script.
But I want to start today and the issues,
In fact, by introducing you to a horse.
More specifically, this is Hans, or Clever Hans
as he became known as.
And Hans was one of the most famous horses in the world
about 100 years ago.
He was raised by a man named Wilhelm von Osten.
Hans lived in Germany.
And he was thought to be incredibly, incredibly smart,
hence his name.
So this is Hans at a public fair demonstrating his intelligence.
Folks that know about him, just no spoilers.
So he was thought to speak German.
He could perform arithmetic.
He could count objects and much more.
Here's a firsthand account of how he communicated numbers.
"Though small numbers were given with a slow tapping
of the right foot, with larger numbers,
he would increase his speed.
After the final tap, he would return his right foot
to its original position.
And zero was expressed by a shake of the head."
So one example of a question he'd answer
was, I have a number in mind.
I subtract 9.
And I have 3 as a remainder.
What's the number I have in mind?
Hans would unfailingly tap the number 12.
So Hans was, quite simply, the most interesting horse
in the world.
And this is Hans with his owner in front of a board
that he used to help him communicate.
And so Hans became world famous for his clear display
of animal intelligence.
I didn't realize there were slides over there.
This is an article in The New York Times from 1904
attesting to Hans's feats of intelligence.
And in this article, the reporter recounted,
"All of these feats stated," I'm going to quote,
"the facts here are not drawn from the imagination,
but are based on true observations
and can be verified by the world's most preeminent
scientists."
So what is going on here?
Why am I starting a talk about machine learning
by introducing you to a horse?
Two reasons.
The first is that Hans illustrates something really
profound in the way that humans approach
the problem of intelligence, in animals, in machines,
and in humans.
So in 1911, a psychologist named Oskar Pfungst published
this book in which he demonstrated that Hans wasn't
actually that clever at all.
In every case, Hans was watching the reactions of his trainer
and reacting to involuntary cues in the body
language of that trainer.
And so this wasn't a hoax.
van Osten didn't know he was creating these cues.
But while we were assessing the intelligence of Clever Hans,
Clever Hans actually demonstrated
something very deep about our own intelligence.
And that is, we have significant cognitive biases.
The way we process information is prone to irrational choices.
One of the cognitive biases we have baked into our brains
is called cognitive bias.
It causes us to look for things that
conform to our existing hypotheses or beliefs.
And so Clever Hans is really a testament to the fact
that we don't process information rationally
ourselves.
And affirmation bias is one among many different types
of bias.
So I bring this point up as a starting point
because it's something we need to be
acutely aware of and acutely sensitive to when we think
about this problem of intelligence, artificial
or otherwise.
There is a lot, frankly, about this topic
that might lead us astray.
But Hans also illustrates something else
that is specifically relevant to the way we approach AI today.
In the early 1900s, we simply could not
understand the way that Hans was processing information.
And yet almost all of his answers appeared to be correct.
Indeed, he seemed to know everything that we knew.
And the ability to-- the inability,
I should say, to fully explain why answers are correct,
or how reasoning is occurring is exactly
the type of problem we face with artificial intelligence today.
That is, what I would posit is the fundamental challenge
we're facing when we think about deploying AI.
So in AI, input goes in the form of data.
A so-called black box of AI makes its decision.
And that decision is usually right.
And in practice, there's a deep sense of discomfort
about what this actually means.
And when proposals are brought up to regulate AI,
this is what they're focused on, this type of unexplainability
is what they're focused on.
And it's frequently what they're actually fighting.
And so today, I'm going to talk about a number
of different dimensions of this type of unexplainability.
But what I'm really going to contend,
and my overall message is that the very idea that we
can explain how decisions are being made,
the very idea that we need to explain how decisions are being
made is actually something we're going
to have to move beyond if we're going to fully embrace
this technology.
And so when we talk about regulating AI today,
and what that might actually mean in practice, which
I'm going to make some suggestions about,
I'm going to be talking about what
it might look like to move beyond explanations
and beyond explainability, or at the very least,
put less weight on the importance of explainability.
I'm going to frankly be talking about what a world might look
like with a lot more Hanses, and how
we might seek to effectively regulate and manage risk
and manage the ethical dimensions of a world that
looks like this.
And so that brings me to three points I want to make today.
First is, I want to talk about what AI is specifically
and the major challenges it presents.
And when I say AI, especially for this crowd,
I want to be very specific about what it is that I mean.
I'm really using the colloquial term, the pop culture term,
or what in practice I think is machine learning.
And even to be more specific, what I'm really talking about
is the increasing use of neural networks in a variety
of different settings.
So when I say AI, if you're like some of the data scientists
that I work with, if that makes you cringe, what I'm really
talking about is the increasing prevalence
and prominence of neural networks, which I'm going
to talk about in a second.
The second point I want to make is that we've been here before.
And in fact, not all of the challenges
we face when we think about regulating AI are new.
And so what I want to do is I want
to talk through past attempts to address these challenges.
And I want to talk about what these attempts can teach us.
And then lastly, what I'm going to do
is set forth some constructive suggestions
on what it is I think we should actually
be doing moving forward to regulate this technology.
And I'm going to focus on three particular concerns I have
beyond just the challenge of unexplainability,
or relating to unexplainability.
So what are the challenges of AI?
Our story really begins in 1955, when a group of researchers
got together to think through how computers could simulate
artificial intelligence.
They could simulate human intelligence.
They called the concept artificial intelligence.
It was one of the actual first uses of the phrase.
And this conference, this summer conference
at Dartmouth in 1955, is considered
one of the seminal moments in the history of AI.
And here's an example of a neural net, which
is what this approach led to.
Again, when people talk about AI, when I talk about AI,
this is really what I mean.
By show of hands, how many people
are familiar with neural networks?
OK, enough that it might be worth
actually going through and explaining in general terms
how they work.
So in brief, this is a visual depiction
of a relatively simple neural network.
We have an algorithm that's composed
of a series of nodes, represented here
in black circles, otherwise known as neurons.
And they make weighted decisions,
and they pass the results of those decisions
on to other neurons throughout the network.
And so specifically, the way the weights in those decisions
are created is based on training data.
And so what you would do is you'd feed a neural network
like this some training data, input data along
with the resulting conclusions about that data.
You would train that network so that has the correct weighting.
And then you'd be able to give that network new data
that it's never seen before.
And it would be able to pretty accurately tell you
things about that data.
And so, for example, a network like this,
once it's fully trained, you'd feed it
some data, images, for example.
And then certain nodes in that network
might get activated, let's say, by the curve in a nostril.
And then that network would be able to tell you
something about those images, like if one of them
had a face, if the network was performing image recognition.
And so the first really important point,
I think, for folks in this room, technical and non-technical,
to take away is that this is a drastic change
from traditional programming.
Traditional programming is based on giving a computer
a series of step-by-step logical instructions.
And instead, this is a complete departure.
Models like neural networks work by finding
patterns in training data and applying those patterns
to new data.
And these patterns are frequently patterns that human
[AUDIO OUT].
And it turns out that this type of programming,
based on feeding models to neural nets,
is actually beginning to replace traditional programming
in a variety of domains.
And it seems to me at least like we're
on the cusp of this replacement.
Different audiences will react to that statement
in different ways.
But this is a great example from a computer scientist
named Jeff Dean.
For folks who don't know about him,
he's a idol on the computer programming world.
If you just Google Jeff Dean and meme,
hours will be taken away from your life.
And basically, what he's saying, his team at Google
is proving that neural networks can
be used for basically any task.
And so Sam is about to release a wonderful paper that
includes folks from that team.
And that's on one end of the spectrum
where they're using neural networks.
But they're also using neural networks
for other completely different, like big database indexing
problems.
They're really using neural networks for everything.
And so this quote is where Jeff Dean says, basically,
"Neural nets are the best solution
for an awful lot of problems and a growing set of problems where
we either previously didn't know how to solve the problem"--
my guess is that's going to be a good deal of how
neural networks are applied to medicine--
"or"-- back to Jeff Dean-- "we could solve that problem,
but now we can solve it better with neural nets."
So this comes from an article from last fall on that team.
It's a little bit hard to read.
But basically, approaches based on data
are teaching computers, teaching software programs, how
to create their own software.
So a few concrete examples of how important this actually
is in practice.
Again, I think in the medical community,
folks understand immediately how powerful some of this is.
Other audiences, I think less so.
So two examples-- one medical, the other not.
This comes from an article in The New York Times
not that long ago.
And basically, Pittsburgh's hotline
for child abuse and neglect is using methods like this
to detect children who might have fallen through the cracks.
And so in some examples, models are literally
being used to find and prevent instances of harm
against children.
This comes from a team at Stanford.
They put together a model called CheXNet.
It has some incredibly powerful abilities
to detect pneumonia through chest X-rays.
And this particular model, I liked that they named it--
they named the actual model.
It led to some headlines about AI replacing radiology
and radiologists.
I think that was a bit misleading.
But the broader point here is that we're
deluged by data, doctors, social workers, many incredibly
important professions.
And models like this have an incredibly, incredibly
important ability to help us, to help find
patterns in that data, to help both
augment the work that humans are doing, and in some cases,
replace it.
So now onto that major problem.
And so all of these advances have
this incredibly difficult, if not fully impossible
problem of explainability.
And so this is a cartoon of a police officer asking a driver
why he's just been pulled over.
And no one knows the answer.
And the fact is, it's not actually
too hard to imagine a circumstance where something
like this might occur.
In the medical realm, not too difficult to imagine
a diagnosis and a very high level of accuracy
where neither the physician nor the patient really have any
idea why.
And this is actually my favorite article, I think,
ever written on the subject.
This is from a philosopher.
This was published the day after IBM Watson won Jeopardy.
And again, the point is that these types of models
are not self-aware.
Now, Watson, for folks who actually
know about how Watson was made, is
a bit of a Frankenstein, a Frankenstein system.
But the point is that these models aren't self-aware.
They don't know what they're doing.
We can't really look under the hood
and say, why did you make the decision you made?
For technical folks who want to talk about explainability,
we can dive into that.
There's a little bit more nuance there.
But the upshot is that they can't exactly answer.
And so laws around the world hate this.
They hate this type of opacity.
To give you a few examples, there's
the General Data Protection Regulation
for Mark Zuckerberg gave testimony this week.
I think less people knew what that law was.
Now, more people seem to.
It's basically a gigantic data regulation
coming out of Europe.
Fines for violating it are up to 4%
of global revenue, which is insane.
So Apple has had revenue of over $200 billion every year
for the last few years.
If Apple Spain, if a subsidiary of Apple violates this,
global Apple could be fined upwards of $8 billion.
So quite intense.
And the key here, the key connection
between GDPR and machine learning
is that, more or less, with some exceptions,
it basically prohibits automated decision-making
of the type we're really talking about today
without express human consent.
It's going to make using artificial intelligence
in practice incredibly difficult.
And a lot of this, I think, is geared
at the underlying concerns that are ethical concerns.
So although folks in the tech community like to scoff at it,
and I think for some good reason,
I think it's important to really think
about the motivations behind it.
That's the GDPR.
In Congress, a bipartisan bill was proposed
at the end of last year.
It focused on some of these issues.
It was the first federal law, proposed law, ever focused
specifically on AI.
And then the City of New York itself has stood up
a committee as of January to examine some of these issues.
And during the Q&A--
we're really just kind of hitting the wave of tops here--
if anyone wants a deeper dive in any of these particular laws,
I'm happy to do that.
But the point for now is that these are just a few
of the growing efforts to regulate AI
and to address this new problem, which is really the increasing
adoption of AI on the one hand, and the increasing
difficulty of understanding it on the other.
And a good deal of these approaches
actually seek to tackle this problem head on,
and in some cases, to mandate certain levels
of explainability directly.
This quote comes from not too long ago,
from the French digital minister.
And he basically stated that "Any algorithm
that can't be explained can't be used by the" French government.
And so again, blanket proposals like this
may be well-intentioned.
And I think they are.
But these types of reactions are going
to deprive us from some very significant opportunities
if they are actually implemented.
The risks here, frankly, are huge.
And if we focus too much on explainability,
I think we're going to lose some very important opportunities.
So that was the first point.
The second point is that we've been here before.
And as scary and as new as these challenges seem,
they're not completely new.
We've faced similar challenges in regulating
opaque technology, or opaque, or unexplainable software
in the past, software systems.
And so I want to run through some
of these examples and the lessons they teach.
And so specifically, I want to talk about three parallels.
I want to talk about a law call ECOA.
This is used to govern credit decisions.
It was passed in the 1970s.
I'm going to talk about a law called SR 11-7.
This is used in the financial system
to govern black box models.
When we talk about this subject, I
think this is one of the most overlooked regulations
out there.
And then I'm going to talk about some frameworks for governing
our own minds, which are the ultimate black boxes,
and how we can learn from some of the legal lessons
surrounding liability and humans.
So let's start with this.
Is the cover of Newsweek.
The article is, "Is Privacy Dead."
I've blacked out the actual date.
And so my question for folks in this room--
if you want to answer, feel free to shout it out.
Otherwise, formulate it in your head.
My question is, when do you think
this article was published?
OK, so we've got 1960, 196--
I hear a '70.
AUDIENCE: '70s.
ANDREW BURT: '70s.
We hear an '80, and then a 2000.
AUDIENCE: '30s.
ANDREW BURT: '30s and '40s.
OK, so basically, we've got a span of 100 years.
[LAUGHTER]
OK.
So the answer is 1970.
This came about in reaction to the rise of statistical credit
rating methods in the financial sector.
And this article described that attack
as literally a massive flanking attack of computers
on modern society.
And the idea was that we were all under assault
by this new type of intelligence.
And it was popular back then to say things like this.
This is a senator from that same year.
And it was popular to say, we need a regulatory department
to be specifically focused on these challenges.
So here, the idea was we'd set up
a federal department of computers
to regulate all software and computing.
Instead of this approach--
and there are clear parallels to the way
some folks think about AI.
For folks who are following the debate about regulating AI,
there are people who advocate setting up
a federal department of AI.
In the 1970s, instead of setting up something like this,
Congress actually passed a series
of specific laws targeted at specific problems.
And so one of those laws was called the Equal Credit
Opportunity Act, passed in 1974.
The problem it was focused on was that lots of groups
faced discrimination in credit scoring decisions.
In addition, these algorithms were incredibly
complex and difficult to understand,
[AUDIO OUT] explainability issues.
And so a solution was to mandate a basic level of transparency.
Its design was to decrease discrimination on the one hand
and increase consumer education on the other.
And so as a result of a ECOA, credit applicants
would be able to understand why a particular adverse decision
was being made.
They were entitled to something called
a statement, a minimum statement of specific reasons.
And that was what you are entitled
to see if there is an adverse credit decision that's made.
And this quote comes from a Senate report on the bill,
basically explaining the importance
of the statement of reasons.
And this form is actually the sample form
included in ECOA's enforcement documents.
And this is, in fact, the template
for how credit decisions are communicated to this day.
If you get an adverse credit decision,
it's based on this form.
There's a list of potential reasons.
And applicants get notified what reason is contributing
to a specific result.
Now, this type of template doesn't fully break down
how a decision is being made.
But the statement of specific reasons
does give us a basic template to understand what's going on.
It makes black boxes, so to speak,
just a little bit less black.
And so when we think about ECOA, I
think the first takeaway is ECOA gets us transparency.
So we can see some of how these scoring algorithms are working.
And that's important.
In fact, I think that's crucial in many ways.
But on the other hand, we can't necessarily understand it.
So transparency, we're seeing how an algorithm works is not
the same thing as explainability.
And I think the enforcement documents for ECOA
make this pretty clear.
The Federal Reserve, which enforces it,
has stated that more than four reasons for any one
adverse credit score is not actually meaningful.
More than four reasons are too many reasons for a human credit
applicant to actually understand meaningfully
in a way that might be able to change their behavior.
And so ECOA might have succeeded in giving some transparency
and inserting some transparency into these new and powerful
algorithms.
But at the same time, ECOA teaches us
that even transparency has its limits.
Transparency is not explainability.
So onto the second regulatory framework.
This is called SR 11-7.
It stands for Supervisory Guidance
on Model Risk Management.
Again, this is like the nerds' nerd law
for some of these issues.
But again, I think it happens to be one of the most overlooked.
It is focused specifically on model risk.
It came about after the 2008 recession.
And this is really when regulators around the world--
this is an American regulation, but there
is some equivalence in the EU--
regulators around the world started
to notice banks using more complex algorithms.
And they started to notice that, as a result,
banks had less of an understanding of how
and why they were making particular decisions.
So this is enforced by the Federal
Reserve and a key regulator within the Department
of Treasury.
So this statement comes from the regulation.
It's basically an acknowledgment of the fact
that banks are relying on more and more complex
algorithms, some of which might fall into the category of AI,
and that banks and financial institutions
are using these type of algorithms
for a wide variety of reasons.
And then the regulation makes this direct, very nuanced
admission of the costs of this trend.
So I want to read this quote.
The regulation states that "models also come with costs.
There is the direct cost of devoting resources
to develop and implement models properly."
That's intuitive.
"There's also the indirect cost of relying
on models such as possible adverse consequences
of decisions based on models that are incorrect or misused."
And so I think this is really the meat of what we're
talking about when we talk about regulating AI,
of controlling models whose interworkings we can't fully
understand.
And so the regulation identifies two major risks.
First is errors made by the models themselves.
For this community, this is a false negative.
This is a diagnosis that shouldn't have been made.
This is fairly intuitive.
The second risk is, broadly, misuse.
And so models, due to their inherent opacity,
can be used for purposes other than their original intentions
very, very easily.
And the regulation makes this incredibly important assertion
that, again, I want to read to you.
It states that-- I have that here in parentheses.
It states that "Models, by their very nature,
are simplifications of reality, and real-world events
may prove those simplifications inappropriate."
And I think this admission is one
of the most important admissions in understanding AI.
And there is a famous quote I'm sure a lot of folks
here are familiar with from the statistician George Box
that "All models are wrong, but some are useful."
And what this means is that every model
is based on correlations and data, but not causations.
And so those correlations might be useful to us.
They might tell us about the likelihood
of a particular answer being correct.
But that's not a substitute for actual reasoning.
It's not a substitute for actual intelligence.
These models do not know what they're doing, just like Watson
didn't know it won Jeopardy.
In fact, I wonder, actually, whether Watson knows
it won Jeopardy right now.
I suspect it does, but you never know.
So what does SR 11-7 say we should do
to fix some of these problems?
The solution lies in a concept that
calls "effective challenge."
And this is really the central thesis of the regulation.
What effective challenge means is critical analysis
of really every step of the process
from creation, to testing and validation,
to deployment of a model.
It means outlining your assumptions.
It means questioning those assumptions and more.
And so the regulation has very, very specific guidance
about how to carry out all of these procedures in practice.
And in the national security world,
we might actually call a concept like this red teaming.
And red teaming is basically a process.
It's something you do in a world with incomplete knowledge
and incomplete facts, something you do to address uncertainty.
I'm going to revisit the importance
of effective challenge shortly.
But before I do, I want to get to that last legal framework,
which is the one governing our own minds.
And so to do that, I thought I'd introduce you to FloridaMan.
So for people who don't know about FloridaMan,
he's been dubbed "America's worst superhero."
This is an article from The New York Times highlighting
his ascendance as a meme on Twitter from a few years ago.
And the basic idea is that, for whatever reason, the beaches
or the weather, men in Florida seem
to be generating newspaper headlines that are outlandish.
And as a result, the more generic FloridaMan
has become a fictional superhero on social media.
So to introduce you to FloridaMan,
I thought I'd give you a few examples of some
of my favorite headlines.
So this is Florida man tossing a gator through a window.
This window happened to be the drive-through at a Wendy's.
The gator happened to be very much living.
This is FloridaMan calling 911 repeatedly
because the clams he ordered at a restaurant
were "extremely so small."
In this case, Florida man actually
got arrested on misdemeanor charges
for calling 911 multiple times.
And so I scoured the internet for Florida man headlines.
And this is actually my all-time favorite.
So full description is a bit of a mouthful.
So Florida man, at the age of 82 years old,
is arrested for slashing the tires
of an 88-year-old woman with an ice pick
during a bingo dispute.
Don't ask me why Florida man had an ice pick, given the weather.
But if these aren't examples of completely unexplainable,
unpredictable, outlandish decision-making,
I don't think those examples exist.
And indeed, the parallels between our own minds
and machine learning are actually quite strong.
From the time we're babies, we ingest new data
on a daily basis--
images, sounds, sensations-- and then
we make conclusions about correlations in that data.
That's how learning works.
But there are a number of problems
with this model, highlighted by FloridaMan's incredibly
bizarre, unpredictable behavior.
And so the question is how does our legal system handle this?
How do we think about regulating FloridaMan's behavior,
behavior we can't come close to explaining.
And there are two real answers to that.
The first is that we treat human decision-making
in different stages according to age.
And so the first thing we do is we
ask if FloridaMan is making his own decisions.
For a certain age, basically, FloridaMan is on the hook--
FloridaMan's parents, I should say, is on the hook.
There's an age where children become responsible
for their actions, usually in the double digits.
It varies by state.
I don't actually know what that is in Illinois.
Some parents here might be anxiously awaiting that date.
Then there's some intermediate stage,
where children take partial responsibility
for their actions.
This is the status of being a minor.
And then eventually, there's adulthood.
And this is when FloridaMan becomes an adult.
In the legal sense of the term, he's
entirely liable for what he does.
And so the key point here is that we
can extend this approach to thinking about models
like neural nets, classifying them
in terms of their maturity.
Certain models might actually need
to reach a certain level of maturity
before they can be deployed in certain circumstances.
But we don't let FloridaMan drive, for example,
until he's reached a certain level of maturity,
until he has processed a certain amount of input data.
And I think the same is really going
to need to be done with AI.
We use age as a proxy for training data.
But even once humans are adults, our brains
still don't process information completely rationally.
They're still full of cognitive biases, like Hans highlighted.
And so the second question is, how does the law
deal with our own inability to explain
decisions, our own decisions, even as adults?
And so the answer there is a standard called
the reasonable person standard.
It's used really, really widely throughout different areas
of the law.
And this slide comes from a great law review
article on that standard.
And basically, that standard places judges and juries
in the position of saying, given all the data
that the person had at the time, given all of the context,
was what this person did the right thing?
Was what they did reasonable?
Now, it's an incredibly subjective standard,
and it can evolve over time.
But subjective standards need not be perfect,
and they can be incredibly useful
when engaging with things we don't fully understand.
So why is this so important when we think about regulating AI?
A few reasons-- first, the way we think about FloridaMan
learning and gaining responsibility
as he becomes an adult, I think, is something crucial,
the crucial lesson for us when we think about regulating
and managing the risks in AI.
And secondly, I really think it's worth
drawing out the point that in the law,
we are using age as a proxy for input data.
As children grow older, they have more input data.
And maturity of training data, I think,
is really going to be a central focus.
There's a great Rand study in a different area,
outside of medicine, on self-driving cars.
And that study was focused on, basically,
how many miles of training data are autonomous vehicles going
to need before we can start certifying them as safe?
And so I think this is really a key point when
we think about controlling risk and deploying AI effectively.
And then, lastly, I think it's very, very important
to highlight the role that subjective standards have
to play when we think about governing unexplainable
decisions.
And so specifically, my real point here
is that we need new standards.
We need common standards.
It's subjective, standards that might evolve.
But we need these standards to help
us evaluate how machine learning systems are
being trained, and deployed, and maintained in the real world.
And right now, I haven't seen, frankly,
any examples of common standards that exist
for the world of data science.
So I'll revisit that shortly.
Very quickly, summing up, from ECOA,
we can learn that we might need to mandate a certain level
of transparency at times.
At the same time, transparency and explainability
are very much not the same thing.
From SR 11-7, we can learn that even when
there's no explainability, there's
still a host of ways of controlling models.
This is effective challenge.
And from the way that law treats human minds,
we can learn the importance of maturity
and subjective standards of reasonableness.
So that was the second point.
And lastly, I want to get a bit more specific.
I want to focus not simply on what we've already
learned, on what laws already exist about governing
[AUDIO OUT],, but I want to talk about a little bit
beyond the challenge of unexplainability alone.
And I want to talk about how we should respond.
Because again, the Jeff Deans of the world
are starting to use neural nets for basically everything.
And the question is, what do we do as a result?
As governments, as health care providers,
as people, we're seriously worried about the risks
of all of these approaches?
So I'm going to start with my most general point.
And then I'm going to talk about three sub-points.
My first point is that AI should not be regulated in one place,
through one regulation.
We should not stand up the Federal Department of AI today,
just like we should not have stood up the Federal Department
of Computers in the 1960s.
We're going to frankly need a host of different regulations
and different approaches.
And so a few examples targeted towards the medical community,
just for this talk--
I think this is going to translate
into a few different areas.
So one is, I think there need to be specific data sharing
regulations around medical data, beyond just HIPAA.
I think HIPAA is woefully underprepared for the type
of data sharing and really, the scale of data
sharing that we're going to need to train some of these models
and deploy them, if we're really going to make use of AI
in medical environments.
I think it's going to translate into specific types
of regulatory review for machine learning models that are being
used in diagnostic settings.
There need to be specific transparency
and third-party auditing requirements for some
of these models, placed on vendors,
third-party vendors, or hospitals
that rely on these models so patients can understand what's
going on, and so third parties can actually properly assure
that they've been validated in the right way.
So this is just a few examples of potential areas
that I think need to be applied to the medical community.
And this slide is from an op ed I wrote earlier this year
in The New York Times basically making the same point in that I
think it's a very bad idea to think about responding to AI
and the challenge it's creating with one single response,
with one regulatory silver bullet, so to speak.
So beyond that general point, I want to get into specifics.
And so what I want to do is I want to talk about three,
frankly, of my own personal greatest concerns when we think
about the risks posed by AI.
I'm going to talk about those challenges.
And then I'm going to try to actually constructively suggest
how to solve them.
So to start with is the issue of liability.
Right now, I think it is just simply not clear exactly how,
and why, and where a deployed model
holds its creators liable.
And I don't think we're going to be able to safely deploy
these models if we, if data scientists,
don't actually know where that line is.
I think it needs to be crystal clear from the outset
exactly where this liability lays.
And so in medical environments, I
think we're looking at a future--
excuse me-- where models created and trained by third parties
are increasingly used by physicians.
And again, I don't think it's clear enough
where liability lies.
And so in many cases, for example, these models
will be more reliable, they'll be more accurate statistically
than human physicians.
And so is the burden then going to be on physicians, on health
care providers, to default to the most accurate solution,
even if they don't understand it,
even if they can't even come close to understanding
the technical reasons behind how the model is working,
or where the data came from?
And what if these models then make
an error, a false positive, a false negative?
Who's responsible?
Let's make things a little bit more complicated.
Let's say a model trains continuously during deployment.
So the model is reshaping itself based on the data it's actually
being exposed to.
Who is responsible in that circumstances?
Is it the creators of the model?
Is it the people whose data the models are reacting to?
These are all really big questions.
I don't profess to have the answers.
I have some suggestions, which we can talk about a little
later.
But at least, I would say, at least
the basic framework for how liability exists in practice
needs to be clear before, I think, we can start using some
of these advances in the areas, frankly, where, I think,
they might have the biggest impact.
So the second biggest concern for me
is this big bulky word, interrogatability.
It comes from a friend, Dan Geer.
And to me, it means a couple of different things.
So the first thing it means is explainability or
interpretability.
And this is largely in the way I've been talking about.
So do we know what caused this outcome?
Can we create a causal explanation
for why a specific input data created a specific output
data or decision?
So a bit more background here.
This comes from DARPA's Explainable AI Project.
And what this graph is saying is really
that different models make different trade-offs
in terms of accuracy versus explainability.
And so on the x-axis, I believe, yes, we have explainability.
On the y-axis, we have the level of accuracy.
And the key takeaway is that different models
have differing levels of both.
And so the level of explainability
is always going to be the result of a trade-off.
Explainability is not simply black and white.
And the fact is-- as I'll talk about shortly--
there are different ways we can make this trade-off.
And that fact, that trade-off, that optionality, so to speak,
needs to be clear when we start building these models.
Again, though, there's more to interrogatability
than just this trade-off alone.
There's a more, I think, human, more procedural side
to this problem.
And this relates to who we can ask,
who we can interrogate if something goes wrong,
if we need to get an accounting for any specific model output.
And so this is the beginning, or the cover page,
from one of my favorite papers ever written
about machine learning.
And it talks about the concept of technical debt applied
to this realm.
So in software development, the idea of tech debt
comes from basically prioritizing deployment,
getting your software to market over sustainability.
And so tech debt is something that gets
progressively worse over time.
You're basically mortgaging complexity, which gets worse.
And in machine learning, tech debt is similar,
but I think it's deeply challenging and deeply vexing
in a variety of different ways.
This paper goes into some of those ways.
But for us, I think, the main point
is that machine learning is deployed in incredibly
complex environments.
And because of that, these models can be dependent on data
that we don't fully realize.
And it can make these models react in strange ways
or unpredictable ways.
And so this fact, the type of tech debt
that accrues in machine learning environments
and make it very difficult to figure out
why a specific outcome happens, this interrogatability problem,
this inability to interrogate and fully account
for a particular decision, I think,
is really, really greatly influenced by tech debt
in machine learning systems.
And then here's the third and final challenge.
In fact, I think this is actually--
I would rank this, I think, as the biggest long-term challenge
I'm worried about when I think about deploying AI.
I have this here as fail silence, more frequently
known as silent failures.
And the fact is, I think we often
won't know what counts as a failure
once we've deployed a model.
And even if we do, I think, oftentimes, we
won't be able to understand exactly why that failure has
occurred.
And so I think, frankly, we're looking at a world
where we might be lucky to know if something has actually
gone wrong.
There's a lot, obviously, to say on this topic.
One of my favorite examples of this, though, is Move 37.
Move 37 took place in the second Go game between AlphaGo,
the series of ensemble methods that
was used to beat human experts in Go.
AlphaGo, in this case, it was Game 2 between AlphaGo and Lee
Sedol.
And so Go is supposed to be one of the most sophisticated games
humans have ever invented.
And Alphago basically wiped the floor with our best Go mind.
And Move 37 is particularly powerful.
This is a move that AlphaGo made.
And nobody understood it.
And it was completely, completely bizarre.
And as a testament to how unexpected
it was, Lee Sedol was so angry.
He was so flummoxed by the move, he reportedly
had to stand up and leave the room.
And it took him 15 minutes to recover from that move.
And at the time, people thought that this was a bug, as models,
you know, are prone to do.
It turns out, over time, we now understand
this was actually a feature of genius of this move
that humans just did not understand.
And so understanding what's a failure, and understanding what
is not a failure, and keeping track of this difference
in ways that are meaningful I really think
is going to be one of the biggest difficulties brought
about in practice by the deployment of AI
over the long term.
So OK, those were my three biggest concerns.
Those were three areas, which some alarmists might digest
and say, OK, we just can't do this
in risky environments, which is not what I am trying to do.
That's not the goal of my talk here.
So I promised that I would actually
have some constructive suggestions going forward.
And so that's what I want to outline here.
So in general , I think this point is pretty clear.
We just need clear liability.
We need it from a regulatory perspective.
We need it from a development perspective.
Everybody needs to understand where the lines are.
The lines to start out with don't have to be perfect.
But they need to be clear if we're going to move forward.
Secondly, that trade-off between explainability and accuracy,
again, needs to be clear.
And it needs to be documented.
And it needs to be the result of a conscious decision.
Now, this is something that I have learned
a lot dealing with engineers.
But quite frequently, in engineering,
we default to the most accurate solution.
The ultimate goal is accuracy.
And in many environments, especially
in medical environments, that can't be the case.
We need to be thinking consciously
about what accuracy we're gaining
and what explainability we're losing
when we make these decisions.
And there are a variety of different ways
that we can balance that trade-off.
There are a variety of different ways
we can cut specific decisions into smaller decisions to help
us make that right balance.
But again, we need to very consciously understand
what decisions we're making and the implications of all
of those decisions.
And then, lastly, we need to be thinking
about what counts as failure.
And we need to be extremely creative about how we monitor,
and alert, and intervene with potential failures.
And so just a few examples of what that's actually like.
This is something I'm quite focused on in my day job.
So some of this, I think, is going
to include best practices, like constantly snapshotting
input and output data, comparing these snapshots
against benchmarks or statistical ground truths
for how we think data, input data, so the world, or output
data, the decisions, should be behaving in practice.
And that means very consciously thinking
about how to insert humans into the loop
when we think there are potential deviations
or anomalous activities.
I also want to make the point that all
of the suggestions I've just made
are going to be in a white paper we're
going to release in the next--
I don't know exactly when.
My guess is probably two months.
So for anyone who's hungry about more specific details
of putting these recommendations into practice,
I'll have my contact info on the last slide here.
And just reach out to me, and I'm
happy to make sure you get the paper.
Our goal is, really--
I said there's no reasonable standard for deploying
machine learning, or controlling risk in machine learning.
And our goal is to at least get the ball rolling
in creating version one of something that
could turn into that standard.
So all of that brings me back to Hans, the horse.
And so I wish I could tell you that Hans had a happy
ending, that after the world learned
he wasn't as intelligent as he first seemed,
he still had a long and distinguished career.
But that is emphatically not the case.
At the beginning of World War I, in 1914,
Hans was actually drafted as a military horse by the Germans.
He's believed to have been killed in action,
or eaten by hungry soldiers, some time in 1916,
neither outcome, obviously, ideal.
But here's an important aspect to Hans's story
that I haven't actually spent time
talking about today in that there was really
a 10-year period where the best scientists in the world
thought that Hans was the real deal.
They thought they'd found a new source of human intelligence.
In 1904, seven years before his intelligence was officially
debunked by Oskar Pfungst, the German board of education
set up a commission to study his intelligence.
And after a year-and-a-half of study, 18 months,
they concluded that it was the real deal.
And the challenges Hans posed really
mirror the challenges we face today with AI.
For example, how do we approach a new type of intelligence
we can't understand?
How do we harness it without stifling its potential?
Should we harness it at all?
How do we understand when it's wrong?
How do we hold it to account when it creates
a negative circumstance?
How do we control the unexplainable?
So the parallels between Hans and AI, of course,
only go so far.
What we call AI today really is quite capable.
New models really can achieve new levels of pattern
recognition that humans simply can't.
We are looking at a breakthrough.
And that's to say the technology is ready,
and it's ready right now.
But what isn't ready, as I hope I've convinced folks here
today, is the law.
The laws in place governing AI are not ready.
We don't yet have any agreed upon practical methods
for deploying these types of models
in real-world, important, and potentially
sensitive scenarios.
We have frameworks we can draw from,
as I've tried to show today.
But we don't have any clear legal response
to the rise of AI in all the areas it's being deployed.
And so when you think about the success of AI,
I would actually ask that you think about the laws governing
AI instead.
I'd ask that you think about this gargantuan task
of regulating AI, which is going to shape the benefits we can
draw from this technology as individuals, organizations,
as health care providers, as patients around the world.
Because AI is becoming ready and is
becoming ready to be used in myriad environments.
And what's not ready is our laws.
The good news is the way our laws respond is up to us.
So on that note, I think Sam and I are going to talk.
And I'm happy to answer any questions you have.
[APPLAUSE]
SAMUEL L. VOICHENBOUM: I was interested in your topic
about liability and silent failure.
And I was wondering if you could give a very practical example.
You know, we're using algorithms to detect cardiac arrest
in the hospital.
And the algorithm group that [INAUDIBLE] runs, for instance,
has developed an algorithm to detect when patients are
going to have a cardiac arrest.
And there's a pager that goes off.
Everybody runs to the room.
And so that algorithm is very rules based.
And as these algorithms mature and become
more deep-learning based, how do you
see those issues of silent failure and liability
taking shape around those kinds of specific examples?
ANDREW BURT: So there's a liability answer,
and then there's a silent failure answer.
The silent failure answer is one that I
think is easier to talk about, frankly,
because that point relates to deploying a model like that
over time.
And so at first, it might be intuitive.
You're going to know when it fails because it
makes this [AUDIO OUT].
I think a silent failure is when the input data changes over
time such that it's making correct decisions,
but for reasons that don't make full sense until suddenly they
don't.
And then there's a change.
Suddenly what has been working for a while no longer works.
And no one understands exactly why.
So maybe in that case, the silent part of the failure
is that the model is actually working,
but it's working in ways that are pushing it towards failure.
And then once it fails, debugging it
is going to become incredibly complex, if not impossible.
So outside the medical world, there's
an example of debugging and really ethical liability.
And folks here might have heard of it.
Google has an image classifier.
And in, I believe, it was 2013, it
started classifying African-Americans as gorillas.
Do folks know about this?
OK.
So it started doing that.
Obviously, a huge problem.
The engineers had no idea it was going
to do this when they deployed the algorithm.
And there is a Wired story from January
of this year, which says they still,
after all these years, have not been able to debug and figure
out exactly why.
And so their answer right now is, basically,
to not allow the label "gorilla" at all
in this image classifier.
And so this debugging issue, I think,
the confronting failures is going
to be huge and incredibly, incredibly difficult.
And it's just going to be an incredibly difficult challenge
we're going to have to figure out.
But I realize I'm talking about one of those questions.
And the liability question--
SAMUEL L. VOICHENBOUM: Well, I guess,
will the failures always be obvious?
Because I was thinking about your Move 37 issue.
And so if the algorithm says, patient
has cancer, use this kind of chemotherapy,
and no one's ever thought of that before, and you use it,
I mean, are you going to know that that was a Move 37
or that's a mistake?
ANDREW BURT: I think right now, you're not.
And that's why understanding when human review comes in
is going to be incredibly important.
I don't know-- excuse me--
the right way to deal with those.
I suspect that what you do in a medical context is if there
is a Move 37 and it's the first Move 37,
you have an alert, this is anomalous activity,
don't do it.
In the medical world--
One of the reasons why I actually think--
so one of the reasons why AI and data science
itself is so fascinating is because it can be employed
in almost every context.
And so there are some interesting articles
on the death of expertise and the rise of data science.
So it's fascinating.
But I think someone like me, who is focused on risk,
I think there's no better environment, or more
conscientious environment, than in the medical environment,
where no one understands risk, I think,
like physicians understand it, though my gut
for an answer like that is what you would do
is you would say no, no to Move 37
until it happens enough times and there's enough human
review that we can somehow validate that there
is some genius there.
SAMUEL L. VOICHENBOUM: So one of the most popular questions
right now on that line is that you think
physicians will be found liable for not
using the best available model.
ANDREW BURT: So there's a great paper that
was just published on this.
And I think right now, legally, the answer is yes.
I don't know if that's going to change.
But I think the way that legal liability works right now
is that physicians are held liable if they are not
using the methods that are most likely, trustworthy,
but the best practices, [INAUDIBLE] best practices.
SAMUEL L. VOICHENBOUM: But those are transparent models,
where we know with evidence-based medicine
why they are the best.
So I guess the question is, when you start
to have opaque models that have been shown over time to make
the best decision, but then a Move 37 comes up
and the physician doesn't do it, will there
be a liability there?
ANDREW BURT: So right.
So I don't know.
I mean, these are questions that are incredibly important.
I don't think they have clear answers.
My sense is what's clear is that physicians
are legally liable to be using the most
trustworthy, accurate methods.
So frankly, I think the answer that folks
in the technical community and the data science
community, the answer I hear the most from them
is the same answer you get with self-driving cars,
where one in however many self-driving cars
drives off a bridge and does something crazy,
and the answer is, that's the cost of doing business
on the one hand.
And in fact, you've seen some of this with the recent incidents
with Uber.
There have been a few incidents in the last few weeks
that have brought this to light.
And so people say, on the one hand--
I think Tesla's statement, actually,
in reaction to one of the Tesla crashes,
was, we're sorry for the loss of life
this caused, but overall, from a utilitarian perspective,
we're going to be saving more lives
by relying on things that don't make this type of mistake.
I can't say I'm comfortable with that.
I don't know.
I don't know if that level of discomfort
is just something we need to accept.
But I think that's a huge, huge question.
And it's potentially one of the trade-offs.
And so what I asserted today, and what
I am 100% comfortable with is silent failures,
and the need to insert human review,
and do some of these types of anomaly detection, where
we at least know Move 37 is occurring.
That needs to happen.
How exactly we should respond, I don't know.
I think there are arguments to be made that there's going
to be some collateral damage.
And over the long run, that collateral damage
is going to be less harmful to society as a whole
than if we just let humans make mistakes like they do now.
I don't know.
SAMUEL L. VOICHENBOUM: But that's not really
how we make our decisions.
We were just talking about this in our ethics class.
You can't design a clinical trial
and say, well, 5% of the people are
going to die from this therapy, but we
hope to learn from it anyways.
You have to have a reasonable expectation based
on the Helsinki criteria that what you're testing
is not going to be more harmful.
So that's a different test of this scenario.
ANDREW BURT: I'm not so sure.
So I was thinking--
so in constructing this talk, I was thinking,
what would it look like if I came and threw out
all of these slides, and just totally focused
on learning and medicine, just like my thoughts on machine
learning and medicine?
And the first thing I thought of was,
what's the future of the Hippocratic oath?
Will doctors really be able to say in every instance,
we're not going to cause harm for precisely this reason?
And then on second thought, I think
doctors do this with medication every day.
SAMUEL L. VOICHENBOUM: It's a balance.
ANDREW BURT: Yeah.
The fact is, it's a balance.
There are a lot of medications that
are prescribed widely that nobody knows how they work.
And sometimes people die.
By and large, these medications seem to work pretty well.
And we just accept some people dying
as a cost of doing business.
And so I think that might be a more analogous scenario
than some of these clinical trials.
But I think that the end statement
is, it's going to be a balance.
And I don't know if we're going to be
comfortable with that balance.
But we need to think through the balance.
Because again, these tools are being developed,
and they're being employed.
And as you know, they can be incredibly effective in ways
that humans [AUDIO OUT] can't.
SAMUEL L. VOICHENBOUM: Somebody asks, a couple of people ask,
is the regulation of artificial intelligence different if we
must consider the impending artificial generalized
intelligence that everybody's worried about?
ANDREW BURT: Thank you, Mr. Musk, for asking that.
I brought up Hans for a couple of reasons.
One of them is cognitive biases.
I know a lot of people are very worried
about this idea of artificial general intelligence.
To those people, I would say, spend more time thinking
about how IT systems fail for really dumb reasons
and then come back to me and talk.
I don't want to minimize that there's a lot of fear
out there.
And I think there's a lot of misunderstanding.
But I think the idea--
I don't know.
I think this point, one of the reasons
why I think it's a bit distracting for what
Elon Musk is doing publicly, but this point
is something I can do a little bit of a deeper dive into.
But it's more of--
should I do a deeper dive into why people think
Skynet is going to kill us?
Or should we just move on to--
SAMUEL L. VOICHENBOUM: I mean, I think
we're all worried about the impending singularity.
ANDREW BURT: Yeah.
SAMUEL L. VOICHENBOUM: I am.
ANDREW BURT: You're worried?
OK, OK, so basically, so this question
comes from this cognitive bias, which
is that humans can't understand, or can't
grasp exponential change.
And so when you look at the growth of computing
and the ability of computers to simulate human intelligence,
what you see is a clear exponential growth curve.
And so from that one statement, we then
get concerns that, well, if this is true, then
in 10 or 20 years, we're going to have a Skynet that's
artificially intelligent and that's
going to be able to maximize its own ability for human survival.
And that assumption is then going to lead it to kill us.
SAMUEL L. VOICHENBOUM: To be fair,
the alarmists make the point that from the time
it happens to the time we realize
it's going to be very short.
ANDREW BURT: Yes.
But that's still based on this assumption
that we are bad at understanding exponential change.
Therefore, there's going to be some godlike intelligence that
exists.
And I think it's a distraction.
I think there are other problems we need to be
focused on right now, today.
And from the world that I live in,
where I'm seeing real risk everyday
and real potential harms, I think it's a total distraction
to think about how computers are going to kill us,
just like I think it was a distraction the 1970s to think
we're being attacked by computers,
which was kind of true.
There's some truth in these worries.
But the worry in the 1970s that we're
being attacked by computers was, OK,
how is it that we make them more useful?
How do we control the ways they're being applied?
How do we, for example, pass laws
like ECOA to try to tackle discrimination?
Not how do we stop computers from taking over life on Earth?
SAMUEL L. VOICHENBOUM: So to that point,
though, one of the biggest concerns people are asking
questions about is, how do we control, or protect
against discrimination that these algorithms will likely
make if they're taking available input data
and then making unbiased decisions?
And so looking at socioeconomic status,
or loan applicability, how do we regulate that,
or how do we prevent that?
ANDREW BURT: So that's a really complex question.
I think there are going to be a bunch of answers.
It is a basic fact that under-privileged and
under-represented communities don't generate as much data.
And AI is based on data.
And so one, we need to be understanding
how all data is biased.
We need to try to be quantifying the way that data is biased.
And I think we need to be thinking
about more creative ways to try to level the bias.
Because right now, there's a good example of this.
Either the city of Boston, or a nonprofit,
or something in Boston created an app
that was designed to detect potholes based on one
of the sensors in smartphones.
And surprise, surprise, the wealthiest communities
had the most smartphones.
So as a result, to start with, all of these potholes
were getting fixed in wealthy neighborhoods, when
that wasn't the intention.
And so once the developers realized that,
they were able to insert some fixes in there that
helped minimize that bias.
But ideally, everyone from every community
would be generating the same type, quantity
and quality of data.
And I think that would reduce it greatly.
I don't know if that's realistic.
So in the interim, I think we just
need to be aware of bias in all the areas we can.
SAMUEL L. VOICHENBOUM: Right.
And that's not a new problem.
I mean, when we look at clinical trials,
and who is on clinical trials, and the data we collect,
it's not always a fair sampling of [AUDIO OUT]
taking the medication or using the intervention.
But sticking with this theme of nefarious intervention,
I saw a really cool example.
I can't remember the exact example.
But it was where an AI would identify a picture.
And then somebody put in some noise into the picture.
You couldn't even tell.
But then when they ran the same AI over the picture,
it found something completely different.
ANDREW BURT: Yeah, we were actually just talking
about that.
SAMUEL L. VOICHENBOUM: So if it's
so easy to fool the human eye or to mess with the data
so that you get a different result,
how will we protect against those types of--
somebody used the term Trojan horse.
But how do we prevent against that type of intervention?
ANDREW BURT: Caveat, I'm not sure I'm
qualified to answer that particular question.
I think the answer that I see people default to for questions
like that is kind of fight bad technology with more
good technology.
And so I think people are thinking about, how can
we create other AI that will detect that,
and will try to figure out if the system's being gamed.
I don't know.
I don't really think--
I mean, I came from the world of information security
before I started doing what I'm doing now.
I don't think there are real examples of, in practice,
people, at least outside of the world of academia
and research, people trying to game systems like that.
But the fact is, I don't know.
I think right now, we're on the forefront of deploying AI.
And then I think assessing the threat
environment is going to be something new and different.
SAMUEL L. VOICHENBOUM: Right.
ANDREW BURT: So I don't know.
SAMUEL L. VOICHENBOUM: The example
I always show about the ethical issues of AI
is always that famous picture of the car veering off the road
to kill its driver to save a group of people.
And so you can imagine that the richer you are, the more likely
you are to have an algorithm that would save you
rather than the group of people.
Again, there's going to be this disparity.
ANDREW BURT: Yeah, so in philosophy,
that's called the trolley problem, which is like,
a trolley is hurtling down.
And how do you prioritize the lives it should save?
That's also a question I don't know the answer to.
So I would make two comments on that.
One, I don't think the choice the algorithms are going
to make are going to be necessarily based
on consumer behavior.
I mean, this will differ by country.
But at least in the United States, Canada, and Europe,
I'm not sure you'll be able to buy a car that says,
you're the most important car, you're
the most important person in the world, and you know,
I'll kill a hundred people before I kill you.
And I love you.
Aren't you so great?
Whatever.
But I know that some lawmakers in Germany
have actually tried to tackle this problem.
And so I believe there's actually
a law on the books in Germany that says,
you need to prioritize human lives based on number
rather than occupancy.
So the Europeans love to regulate technology--
apologies to any Europeans watching or in the room--
I think, before they fully thought about that technology.
So I don't know how that works in practice.
I don't know how that works in practice.
I don't know if there's going to be
an algorithm in every Tesla in Germany
that is scanning for human faces and then determining
when it could possibly hit one.
And if it is going to hit one, then it
finds something that's not a human face
and accelerates into that.
Like, I don't know how that's going to work in practice.
But people are focused on it.
Some people are legislating about it already.
SAMUEL L. VOICHENBOUM: I'm not quite sure I
understand this question.
But it sounds cool, so I'll ask it.
It's the reasonable man test is based on the collective morals
and ethics of people which evolve over time.
So should AI be judged by a collective of its peers?
ANDREW BURT: So that's a really interesting question.
I thought about that.
When I talk about the reasonable person standard--
also good on whoever asked that for having a good understanding
of both how the reasonable person standard is used
and how could be placed in the world of AI.
So the most frequent comment I get when I talk about these
standards is, how are we going to apply a standard that we
can't--
if we can't understand these algorithms,
how are we going to apply a standard to them?
So when I talk about the reasonable person standard,
I'm really talking about just having an agreed upon standard
for the humans that actually deploy these, create, deploy,
validate, train, et cetera, all of these algorithms.
So that's what I'm focused on.
It's not about judging the inner workings of the models.
But that said, I am not a person who
thinks it's bad to approach problems with technology
with more technology.
And so I would actually be really interested to see
what that would look like.
I don't know what that would look like.
I mean, the very idea of having AI judged by its peers
seems preposterous.
But I think there could be some value in again,
trying to think about maturity and levels of output data
in particular models, and using that output data to train
other models to then assess the reasonableness of that output,
if folks are following me.
And so I'm open to it.
I see no reason why we shouldn't try.
I see lots of reasons why it might not work.
But I'd say go for it.
If whoever asked that question wants to go build something
like that, you have my email.
Tell me how it works.
[LAUGHTER]
SAMUEL L. VOICHENBOUM: I was at an AI talk yesterday.
And somebody asked the question, are you worried about AI
replacing you as a physician?
And the person answering the question
said, I'm not worried about that because there'll always
need to be a human in the loop for making a decision.
And I thought that is exactly not how
I'm thinking about this.
And I'm thinking that there's this spectrum of where
the decision is made being continually pushed upstream.
And the question is, where does that become too uncomfortable?
Because right now, for instance, you
could have an AI that tells you not to give a certain medicine.
And so an alert would pop up and say,
this patient shouldn't get this medicine.
The next step would be having the AI refuse
to give the medicine.
And to me, that feels uncomfortable.
But I bet in a few years, that won't feel uncomfortable.
And so what is it about our accepting
these new technologies that lets this evolve over time?
Because as you said, or you alluded to, in fact,
we can't imagine what it's going to be like even in five years.
So I think the landscape is going to change very quickly.
ANDREW BURT: Yeah.
So I think that that question is, are we
looking at a world with augmented intelligence?
Or are we looking at a world with artificial intelligence?
And to what extent are these developments going to replace
human decision-making?
And I think there's going to be a spectrum.
And I think there's going to be a spectrum.
In the medical community, I don't think it's fair--
I don't think there's going to be one monolithic impact
on the medical community.
I think there are going to be jobs that can be
a little bit more routinized.
I think that cheXNet got a lot of attention.
And I think radiology might be an area where
it might have a bigger impact.
I don't think it's going to replace radiologists,
but it might make radiologists more efficient.
My guess is it will reduce the need for as many radiologists,
and it will require that radiologists
that are currently practicing probably become more technical.
It'll change the expertise level.
And so I don't know.
I mean, image assessment is one thing.
And I think it's going to depend.
It'll depend.
SAMUEL L. VOICHENBOUM: So I think
we're falling into the same trap because I think
it's fine to imagine that your AI will tell you when there
is metastasis to the lung.
And then a human will look at that
and say, yes, I agree with AI.
There's a met there.
And I think we all would say, I'm
uncomfortable with not having that human
there, with having the machine just report that and have
action taken on it.
But I can guarantee there's going to come a point--
who knows when that is-- when that's exactly what's
going to happen.
ANDREW BURT: Yes.
If you folks can see, there's a sticker on my computer
that says, "depends."
And that's because that's all that lawyers
say, which is, it depends, when they tell you you
[AUDIO OUT] it depends.
So for that, I won't say you owe me money.
But I will say, I think it depends.
And the reason why I think it depends
is because I think there are some areas right now where I
would be comfortable having a machine learning model make
a diagnosis about me and then actually prescribe
a prescription.
And there are other areas where I
would be wildly uncomfortable.
And I think all of that really depends on--
I think it depends on the maturity of the model.
Probably the more serious examples,
for example, the predicting mortality rate upon admission,
I would want a human review there.
So there's a scale.
And the scale probably is, how mature is the training
data the model is based on?
And how well refined has it been?
There all these different methods, the ROC curves,
et cetera, all these different methods
to assess the accuracy of a model.
So that would be one.
And then the other is, like, seriousness
of something going wrong.
So if the likelihood of something going wrong
is, a recommendation for a topical allergy ointment,
and I get a rash, I think I'd be pretty comfortable doing that
now.
And so I think it depends.
SAMUEL L. VOICHENBOUM: I would also
argue that, for somebody with, say, refractory cancer,
you're looking for that Move 37.
And so who knows how we'll approach that?
I just think it's a really cool--
I think we're in for some really cool rides
coming up, both in our autonomous cars, as well
as our medicine.
ANDREW BURT: Yeah.
Yeah.
I mean, for that, it might be, like, how hopeless is
the current situation?
And the more hopeless it is--
or what is the risk tolerance of the patient?
And that might be one axis that you'd want to plot.
But I agree.
I mean, I'm here.
I'm thinking about this.
I'm working on it because I think
it's fascinating and important.
And I really don't think there's more
of a fascinating area than applied machine learning
and the world of medicine because it's
so clearly impactful.
And the impact matters, and the risk matters.
And so I think of every area, honestly,
more than self-driving cars, I think
the folks in this community, I think,
everyone else is going to be learning from.
SAMUEL L. VOICHENBOUM: You're just saying that.
At the Uber convention, you'll say something else.
ANDREW BURT: Yeah, that's right.
Yeah, yeah, yeah.
SAMUEL L. VOICHENBOUM: All right, well, thanks so much.
This was fascinating.
It was a great talk.
Good for you.
ANDREW BURT: Thank you.
Thanks, everyone.
[APPLAUSE]