Tip:
Highlight text to annotate it
X
>> Professor Kumar: Thank you for coming to the lecture and
welcome to the B. Thomas Golisano College of Computing
and Information Sciences as well as the Dean's Lecture Series and
this is the first of the lectures for this year.
And I'm Mohan Kumar, the Chair of the Department of Computer
Science in the Golisano College and it is my pleasure and
privilege today to introduce our speaker Professor Sandy Pentland
of MIT.
The goal of this lecture series which was started about ten
years ago is to bring leading minds from academia, from
industry and from the government to RIT campus and share their
ideas with the students and faculty at RIT.
And today's speaker is the 48th speaker in the series so we have
two more to go to celebrate.
So at this time I would also like to acknowledge our
professional interpreter Kate for translating the talk to the
audience and again it is my pleasure and privilege to
introduce Sandy Pentland.
Sandy is a globally recognized authority on big data and he's
the director of the MIT Human Dynamics Program and the MIT
Media Lab Entrepreneurship Program.
In addition, Sandy is the lead academic for the World Economic
Forum's Big Data and Personal Data initiatives and he was
chosen by Forbes as one of the world's seven most powerful data
scientists.
So in today's lecture titled "Social Physics and the
Beginning of Big Data Society" Sandy will discuss the impact of
big data and the need for a new deal on data to preserve privacy
and personal safety.
So sit tight.
I've heard Sandy talk before and it will be a great talk I can
assure you that.
Thank you.
>> Professor Sandy Pentland: Thank you.
[ Applause ]
Good turn out here.
This is great and a beautiful day.
So thank you.
So what I wanted to do today is talk about sort of my view
about big data and the things that I think are the most
important and the things that are misunderstood.
So big data is this buzz phrase.
It's everywhere.
Everywhere is big data and you see these definitions like it's
volume, velocity.
Well, yes and no, actually if you look at video, video is
bigger, online video, bigger, faster than almost anything else
and it's been around for decades in very many formats, so what I
think is actually different is not so much the bigness of it as
the depth of it.
So now we have data that's very detailed data about all sorts of
things but most importantly about people.
It used to be you could live your life in relative obscurity
from the point of view of the digital webs that surround us,
but today as you move around you leave bread crumbs everywhere
and that's the thing that's really changed, from your cell
phone, from your credit cards, from driving down the highway
and that's huge data.
It's continuous data and that's really the difference, so I want
to talk about that today, the good things about it and also
the bad things about it.
But first of all I want to introduce myself a little bit,
maybe put a little humor into things, so this is me back in
1993. This one here.
Because, what occurred to me was people were beginning to
talk about computers everywhere.
I mean you have to remember that PCs and things like that were
barely a decade old and they were actually getting out into
society and the Internet was beginning to grow and people
talked about pervasive computing, computing everywhere
and I said, well you know actually if computers are going
to get that small and that powerful that they're going to
get built into our cuff links and our glasses things like that
long before they get built into the walls because we don't
renovate the walls very much.
So what I did was I created the first cyborg collective.
So you've got Star Trek, right?
And the Borgs?
We had the Borgs.
We had it about the same time as they did, maybe a little before.
These are some of the Borgs and back then to simulate what it
would be like to live today we had to have the students run
around with these motorcycle batteries, this big, right?
Ten pounds, and PC 104 PC that was hot enough that you didn't
want to touch it and these little displays that are
actually vibrating mirrors so that you could see stuff but of
course today that turned into Google Glass and all the
watches that people are beginning to do but the
nice thing is is that guy in front there,
that's Thad Starner, so he got into this
enough that he wore that sort of thing for the next 20 years
until Sergey Brin said, "Okay, guy, make it real."
Now he's the technical lead for Google Glass.
So I like to describe myself as the grandfather of Google Glass.
I didn't wear it for all those years but we helped cause the
problem, shall we say.
I do a number of other things too.
As was mentioned I run, co-lead the discussion around big data
and personal data which is privacy and security, at the
World Economic Forums.
Have you ever wondered what it was like to go to Davos?
That's what it looks like.
It doesn't look like too much.
A hotel room with a bunch of people in business suits and
stuff but they're pretty amazing people.
I mean the Vice President of the EU is in there.
There's a special representative from the National Security
Council there's the CEO of Visa.
It's sort of an interesting crowd and my role is to make
sure that they talk and not fight and then I do a number of
other things too.
I'm on the Advisory Board of Motorola Mobility which you
might remember is owned by Google now and one of the
biggest sort of client, mobile client of Google is Samsung so
they don't want to fight with Samsung and what that means is
that Motorola Mobility has to reinvent what it means to be a
cell phone.
You remember I did that wearable computer stuff years ago?
Coming soon to a store near you.
I also am helping to run the first autonomous, commercially
accessible autonomous vehicle in the world produced by Nissan and
I advise Telephonics on what they ought to do, start
companies; do stuff like that.
Forbes likes me.
So that's what I do.
So let's talk about big data.
I wanted to tell you that to give you a sense of where I come
from because this is my view on big data.
It's not the canonical truth.
There is no canonical truth at this point so when you think
about big data what everybody thinks about is Google and
Flickr and Facebook and those are certainly important and
familiar but that's actually not where the action is.
I think that while those are things to be concerned about and
think about the real action is around the fact that wireless is
finally here.
So what this diagram shows is the distribution of wireless
devices in the world.
That's all the yellow stuff.
The blue stuff are the big undersea cables that carry the
bits around but the thing that's the most striking change in our
world is not Google, is not Flickr.
It's the fact that virtually every adult in the human race
has a sensor package that knows where they are, knows who their
friends are, knows all sorts of stuff about them and they have
the ability to give and send digital data, messages.
Everybody is connected now.
It's amazing.
I mean you go to the most remote places in Africa and you find
that the average phone ownership is 90 percent of the adult
citizens.
That's typical, in many places the average phone ownership, in
the poorest countries, the average phone ownership is like
1.2 per person and the reason is that the phone companies are
fighting and then there are also things like you have the phone
that you use with your wife's family and the phone you use
with lots of interesting cultural variations but what
that means is that you can do things that you've never dreamed
were conceivable but what we think about the world is really
different.
Let me give you a couple of examples that are from my
students and myself, so this is a little start-up company.
What it does is it sends coupons to people to buy shampoo and so
forth.
That is not a terribly interesting thing but the guy
that started this started with sort of a small pot of money and
ended up with a customer list of 3.48 billion people in under a
year.
Well, how did he do that?
Well, he went to seven people who happen to run some of the
seven largest phone companies and said, "Can I use your
network to reach people for consumer products?
And we'll pay SMS fees and go back and forth and we'll give
them top up minutes but we'll use your network."
And of course they said, "Sure, we're going to make money off
this."
But think about that, to be able to reach, a start-up without a
lot of money, to be able to reach the majority of humans in
the world in under a year.
That's just completely unheard of.
It takes companies decades to build up a fraction of that
size.
I don't want to say this is the world's best business or
anything like that but the ability to scale like that with
an existing infrastructure is incredible.
And Moore's Law tells us that all of those phones that are out
there are going to be Smartphones in just a couple of
years.
Already, almost everywhere in the world every village, every
small businessman has a smartphone.
Not everybody has a smartphone.
Pretty soon it's going to be that way and that means another
huge change in communication.
Here's another thing that a student and I did in India.
There are these things called TopUp minutes.
How many people know what TopUp minutes are?
Yeah, so you can buy, in most part of the world, minutes to
put in your phone.
You give them some money.
They put more minutes on your phone so you can call things
and that's rather than having the normal plan that's a way you
get communications.
Well those minutes are a little bit like cash, aren't they?
So what they did is they set up a little exchange so if you had
minutes on one network you could set up trade in for minutes on
another network and they're essentially made of virtual
cash.
And what we had done is set up a network of over a hundred
thousand retailers, very small retailers.
The only thing they had really in common is six square foot
with six square foot, like 600, right, a store and a phone
and what you can do now is you can take those TopUp minutes and
you can pay that little retailer in the middle of nowhere to buy
whatever it is you need and you can even take cash out.
You can give him some minutes he'll give you some cash.
Suddenly you have this alternative economy that lets
you buy things, sell things, trade things.
Hundreds of millions of people and again, you can spin it off
almost immediately.
Not that it's not a lot of hard work but that's a really
different way of doing business and you also have to remember
that almost all these people are on unbanked.
They've never really been part of the formal economy before.
It's really transformative.
Here is another spin out and this is one that Chief Technical
Officer of the United States describes as a real game changer
and if you think about it some people laugh at this.
It's a check engine light.
Your car has a check engine light, right?
When something's going wrong it says "Check Engine."
Do you have a Check Engine light?
Well, you ought to.
Think about it, so in the health care system today we are out of
money.
We don't do a very good job of delivering health care but
there's another aspect which is that there are lots of estimates
that say that most visits, the vast majority of visits to the
doctor are at the wrong time.
They're the worried well.
They're the well, I'm sorry sir but you know if you live long
enough you may get out of the office sort of visit.
It's not at the right time.
What you would like to do is you'd like to know when somebody
should get help precisely, but what happens when you're
beginning to get sick?
Well, actually, what happens is your behavior changes.
You don't go out as much.
You don't call the same people.
You don't look-- we have a friend that's getting sick you
say, "You don't look like you're doing so well."
What is it that you're picking up on?
You're picking up on these behavior changes.
You can do the same thing through sensing of the phone.
A little Check Engine light goes on.
It says, "You're not acting like yourself.
Something's wrong.
Maybe you ought to find out."
And what that does is that gets people to the doctor in time to
do something and it also tends to reduce the worried well
because after all my check engine light isn't on.
Another example of this, another spin out is something that
measures mental health.
If you look at the criteria for depression, diagnosis of
depression, PTSD, schizophrenia one of three things come up
again and again and again.
One is you change how you socialize with people.
Different diseases are slightly different but people's social
life changes dramatically.
Their activity level changes.
For some diseases like depression you tend to just
cocoon away and not do anything.
For others you'll get frantic and you're everywhere but it's
a major change and the third thing is it disrupts your daily
habits.
You become irregular in your habits.
Well guess what?
You can measure all those things off of the sensors in your
phone.
So it's not quite the check engine light because that's sort
of a general thing.
This is something that if you're under doctor care can be used to
have continuous monitoring with the doctor and that's actually
how it's used.
So the point in telling you these things is to sort of break
your mindset about what's possible.
There's this infrastructure out there of sensors and
connectivity that makes it possible to reach almost
everyone in the world almost immediately and do things that
only your best friend could do before and that's astounding.
That's the thing that's driving big data.
Now, what I've shown is all things that are actually pretty
familiar but you can go a lot farther than this, so let me
show you another thing.
So this is something I did a few years ago.
So these are people moving around in San Francisco and
these big dots, well those are like stores and restaurants and
things where people go quite frequently.
It looks like a nice city but if you analyze this, what you find
is you find that the city is really made up of separate
populations, subgroups that really don't mix with each other
at all.
They walk right by each other on the street but you know there's
the group that likes the edgy bars and the avant-garde music
and there's the conservative people and then there's this
crowd and that crowd and they don't actually mix that much and
if you watch where they choose to go, using GPS and things like
that, you can pick out the places that are for group one
versus group two versus group three.
So you can begin to stratify the population.
Now this is just like demographics.
You don't need to know who these people are.
In fact this was done, the one I'm showing you, was done off of
taxi cab data.
Where do taxis pick up?
Where do they drop off?
Well it turns out different types of people go to different
places.
They choose to go to different places.
And of course you can do it in lots of ways, but once you know
where these clusters are you can go talk to some of the people
in the cluster.
I can go talk to these guys here, right?
What do these guys have in common?
You can't guess from the map unless you know San Francisco
really, really well.
Those guys are hard partiers.
They really are.
They have almost an order of magnitude, more likelihood of,
let me see I did this backwards, of getting alcohol poisoning.
And there's another group that has a factor of almost five
greater likelihood of getting diabetes.
Now diabetes is an enormous cost to our society.
It comes from a lot of things.
It's not like one cause.
It's a whole bunch of things but it's really interesting that
there are places where prediabetics go and places where
prediabetics don't go.
We don't really know why but once you know where they go
you'll know where to set up a "Get Yourself Screened"
center, right?
More intelligible the way I had this organized was if you know
people's choices about where they spend time you know a lot
about their preferences.
So everyone talks about Facebook they post all this stuff.
Well the stuff you post on Facebook is the stuff you want
other people to know and you edit it to project the face that
you want to do, right?
Nobody actually tells the truth on Facebook.
It's not human nature but where you spend time is a real
commitment.
That's who you are.
And so by looking at where people spend time you can tell a
huge amount about them so obviously what their preferences
are, what they chose, things that aren't obvious from
location but you know, the people that all go to these
sorts of places tend to dress the same.
It's really interesting.
It's a very strong thing and so marketers look at that and say,
"Oh, those are the guys that buy the leather pants.
Okay. And these people over here those are the women that buy the
red dresses.
Now I know where to advertise.
Now I know where to put up a store."
If you put all this stuff together what you get is you get
a sense of the rhythm of the city, so you'll know that if
you're in the middle of a weekday that there are people
doing these sort of activities there and they tend to like
these sort of products.
There are other people doing other things in other places and
that it changes at night with different people going to
different places.
So you can begin to design a city to reflect the patterns of
the people.
Who's getting nervous?
>> I've been for a while.
>> Professor Sandy Pentland: Oh, okay you've been for a while?
Good. You should be getting nervous.
You can use this in lots of ways so let me give you an
example of something that we did recently.
So I helped talk Orange, which is a large carrier in Europe and
Africa into releasing the data they had about phone call usage
in the Ivory Coast.
We also got the UN to release all the data they have and the
World Economic Forum to release all the data they have and we
made this big Data Commons and it's probably, arguably the
first in the world that shows all these sorts of things
about all the people in the Ivory Coast.
Now the people in the Ivory Coast are a little special.
First of all they're very poor.
Second of all they just finished a very violent civil war.
So there's a lot of places that for instance the government
can't go because they get shot.
That's the way it is with civil wars.
Now this data we did a lot of interesting things to it.
We aggregated it around cell towers so you can't see
individual people and we chopped it up so you can't track things
for long periods of time only for a week or two at a time.
And what that means is that it's very, very hard to understand
particular people but you can see average patterns really
well.
And so what we did with this data is we released it to some
90 groups around the world, research groups to be able to
see what can you do for the Ivory Coast to help the
Ivory Coast?
I'll show you some examples.
One group in Dublin looked at their transportation.
I know you can't understand this.
I can't either, looked at the transportation network and it
turns out that using this sort of data you can tell where
people start in the morning, where they work or make money
and when they go back.
So you know what the transportation they'd like to
have.
Now the subtle thing is that the way transportation is done
everywhere in the world is somebody stands there or they
put out those little rubber things and they count the number
of cars going by or number of people going by but they don't
know where those people started from or where they're going.
They don't know where they would like to go as a transportation
network and so if you know this thing which is called an origin
destination matrix you can design a transportation system
that's a whole lot better because you can build.
Put the buses that go from places where people live to the
place where they work.
They never knew that before.
When they mapped that on to the existing bus system they
discovered that small changes, very small changes would reduce
the commute time by 10 percent.
Ten percent is big when you do it across a huge city in terms
of pollution, in terms of energy cost, in terms of wear and tear
on the people.
Another sort of thing is the public health systems, so
infectious disease depends on people interacting.
And if you're going to set up interventions, you know, telling
people to stay home or telling people to wash their hands or
giving people inoculations you've got to know where the
people are.
But they never knew where the people were.
In fact, we don't know where the people are in this country.
We don't know where people interact and that means we can't
set up a public health system that's really effective.
So using this sort of data they were able to finally map the
places where people got the flu.
They were also able to track back and find places where
people got diseases like malaria which you don't have to worry
about here but it's a big problem there and estimates are
they'll get about a 20 percent reduction in flu propagation the
next time they have a flu season which should be about now.
Pretty good, more interestingly, another group was able to do
something else which is shown here.
In the civil war they had the battle lines ended up around
here and so it was thought that this was a north/south division
because really nobody knew where the ethnic groups were.
These were battles between different ethnic groups but it
turns out that's not the case.
It turns out you can use this data about mobility and
communication to figure out who talks to who and of course if
you all speak the same language you tend to speak to each other
more than speaking to other people and similarly you visit
your relatives more than you visit random other people.
So you can map out ethnic division pretty accurately and
what they discovered is the ethnic divisions are not
north/south the way they thought they were.
They're vertical which means you can now begin taking much more
effective action in trying to defuse tensions at the places
where the ethnic groups come together.
This map's even more interesting.
So they don't have any data about poverty in the northern
part of the country, right?
Because if you go up there you get shot.
It's hard to collect data when you're dead so but there is a
technique that you can do with this data that gives you a very
accurate sense of what's called multifactor poverty.
So it turns out that level of disposable income, child
mortality, crime rate, life expectancy all co-vary.
One goes up the others go up.
One goes down the others go down.
So you just put them together in one factor MPI.
It turns out that when people are feeling more comfortable
they explore more so they move around to more diverse places.
They call more diverse people and when they're feeling
threatened in various ways they do less of that and you can
average that over regions to get a very accurate estimate of this
MPI.
So that's pretty amazing if you think about it.
So that means that I can look at the average cell phone tracker
and tell you how many babies are dying, literally.
I'm not making this up.
I can do it really pretty well and I can do it today.
I don't have to send people out in the field.
There's none of this one census every ten years sort of stuff or
ask all the doctors, what doctors?
It's the Ivory Coast.
What are you talking about?
Finding out from people, you know the stuff.
You can actually look at things like poverty conditions.
You can look at things like the spread of infectious disease and
if you can look at it you can begin doing things because among
other things you can have a conversation with the government
that says, "Well you say that you've helped this area a whole
lot but look.
It doesn't look that way.
What's going on?" Until you know that you can't
have that conversation.
So what I wanted to do in this sort of first half of the talk
here is give you a sense that first of all big data is data
about people primarily.
Yes, there's the Internet of things and coordination but the
real thing that's happening here is is more Internet data about
people and that's a scary thing because if you can tell all that
about people who owns it?
Who oversees it?
So recently there's been a lot of news about the National
Security Agency.
Everybody hears about that, right?
And the National Security Agency says, "We're just looking at
metadata," right?
What did I just show you?
Everything single thing I showed you was metadata, not a thing
wasn't metadata, I don't think.
Nothing was not metadata.
So that's what you can do with all that data.
Now is that a good thing or a bad thing?
Well I'm not going to debate that but I'll tell you this is a
really interesting and new capability and it's here now.
And the NSA is the famous example but the other thing
that's happening of course is that companies are using this,
sometimes little companies, sometimes unethical companies.
I have a little game I play on my cell phone because my son
plays it.
It's sort of interesting but what's really interesting is
when they update software because they send you a little
note and by looking at the note, it's always in broken English.
You can tell that this was written by someone whose native
language is Chinese.
And then you look at the permissions on the thing and it
says we want to know where you are and who you call.
Why does a video game want to know where I am and who I call,
especially given that the data then goes back to China?
Yeah, exactly, let's think about that one.
So it matters who owns this data and who controls it.
Now what typically happens next is people say, "Privacy.
Now we've got to worry about privacy.
Shut it all down."
Because that's been sort of the tradition but I hope you will
remember the last ten, fifteen minutes that I can tell you
where the babies are dying.
I can tell you who's going to get flu and die in a pandemic.
I can tell you how to design cities to be more energy
efficient.
You want to lock it all down you give up all those public goods.
It's real simple.
You tell me how many babies you want to die?
Seriously, I'll tell you how to lock it down.
Okay? There's has to be a better solution and the end of the
solution comes as follows.
So what I'm going to give you is sort of a summary and a little
bit of history of the discussion at Davos and then what some of
the people who were part of this, so that includes people
like the Justice Commissioner of the EU, the head of the Federal
Trade Commission, folks of that sort.
What's happened and then where things are going
it's particularly interesting because this is a computer
science crowd.
I know not everybody is computer science but you ought to be
interested because it's your stuff.
So 1950, this is the way systems in society worked.
You showed up in the bank.
You talked probably to somebody you knew because you lived there
for quite a while and you gave them your signature, physical
signature.
They compared it to the other signature and that's how you
certified your identity and how you got services and privacy was
not a big deal because the worst that would happen is the teller
would go tell somebody else in your church group or something
like that but it certainly wasn't a go to faceless
bureaucrats and wherever because there was no-- it was too
expensive to copy all that stuff.
And then we got electronics.
We got IBM 360s.
We got CCTVs.
We got fax machines and in the '60s people got scared because
suddenly the old traditions that made data local, make it
personal were breaking down and our modern notions of privacy
were born in legislation that said, okay guys we're going to
lock things down.
This is not okay and I mean I know the people that actually
wrote that legislation.
They're still alive and kicking and the idea was okay we have
this way of doing things and we're going to make the
electronic way of doing things match the physical one.
Nothing else allowed.
Well that sounds good but what's happened over the years is that
people have realized that this data is really valuable and
they've found all the little corners where you can sneak the
data out and sort of under the table and so now what we have is
this grey market.
We have people spying on us.
We have things happening that we don't always know about because
this legislation was sort of like just lock it down.
They didn't envision that people would find all these ways around
it.
So we've got a problem.
So what are we going to do about it?
Well, the core is that computer systems have to in some sense
be compatible with these sort of more human systems.
Our expectations about what happens up here have to match
our expectations that we grow up with and as humans we tend to
expect.
I don't mean this in a sort of haughty way.
We have certain things that are in our biology about what we can
think, about what we can't think sort of capacity limitations.
We have expectations about social relationships, about
causality and basically we have to be able to know what's
happening up here in the same way we knew what was happening
here.
That's the thing that you want to go for, I would claim and the
key to it is to put this data into a framework that's
understandable and the key thing here is to think about it and
say so what's happened is this data is now very valuable.
That means it's an asset to somebody.
Now we know about assets.
We have money.
We think about that just fine, right?
We get confused and make mistakes but we deal with money
pretty well.
We have property, right?
We own land.
We can sell that.
You know, understand how that works, right?
Around the edges it's a little ragged but maybe we could do the
same thing with data.
Well, but you can make copies of data.
Well, I've got some news for you.
You know the things in your bank that you think are your money?
It's really just ones and zeros.
But there's a system on top of it that makes sure they can't
just arbitrarily copy your data and take it from you and stuff
like that and the system is actually one that's based on
very old principles.
These principles first of all, they are actually in the United
Kingdom in Britain, and they were ownership rights, the right
to possess, dispose and control.
Now it's not the same as ownership.
It's rights, not full ownership and what that means is that some
of the dispute resolution is different sort of a technical
legal thing.
Keep those in mind.
And so back in 2007 I proposed what I called The New Deal on
data and helped start this discussion within the Forum and
other places and the vision is this.
Data is an asset and the first thing you have to do with data
therefore, with this asset is decide who controls it?
Whose asset is it?
Does it belong to the government?
Let's see a show of hands.
Who thinks it belongs to the government?
Good. Some countries you would get more hands up.
Who thinks it belongs to things like the Telcos and the banks?
Who thinks that you should have control of it?
Yeah, okay.
So instant experiment, the only politically viable solution is
individuals control data about themselves nothing else is going
make it as a sort of stable political statement.
It's not a statement of principles.
It's not a statement of mathematical facts.
It's a statement of politics, so in these meetings you have
senior politicians, you have companies and you have people
like me who cause trouble.
And the idea is you need somebody that's a win for the
politicians.
They want to get re-elected.
The companies have to be able to make money and the citizenry has
to be protected and get value from it.
They need the win, win, win.
And the solution that got hammered out basically gives
people much more control over their right, over their data
about them.
And I'll explain this in a little bit and the system that
goes with it.
The ideas have been codified into the EU Data Protection Acts
and they stem from the Human Rights parts of the EU
Constitution that give people literal ownership of data that
is about them.
In this country it's in the U.S.
Consumer Privacy Bill of Rights which is not yet enacted but
also in the regulatory framework that's been put forward and is
being acted on by the Federal Trade Commission.
And the idea is roughly that whenever people collect data
about you they have to give you informed consent.
So how many people are familiar with informed consent?
So you do this whenever you go take a medical procedure or
participate in a human subjects experiment they have to tell
you, "This is the data we're going to collect.
This is what we're going to do with it.
This is the risk that you have and this is the benefit you're
going to have and you can opt out at any time and we will
destroy your data."
Those are the things that you have to do as we do human
subject experiments.
And that's the thing that people are willing to do in these new
regulatory frameworks.
Now you might ask, okay I see why the government would want to
do that but why would companies want to do that?
That's the sort of first thing and the answer is that most of
the companies at the table don't trade data yet.
They're regulated industries.
So the data I showed you those are Telcos.
Telcos don't actually trade data because they're licensees of the
government.
They're regulated.
If they get caught with their hand in the cookie jar it's bad
for them to say nothing of what their clients, their customers
are going to do.
I mean if Verizon really screws up everyone is going to go to
AT&T.
So Verizon doesn't want to do anything that's really bad for
those reasons, same thing with banks, same thing with
hospitals.
But if in fact they give you these sort of rights, really
inform you of what's happening then what this regulation says
is they can go ahead and do it.
So for instance, they can say look, we'd like to take this
data about you and do the following thing for it and we'll
pay you in this way with better services or more money and you
have the right to say yes or no.
And if you decide later having said yes that you don't like it
anymore, you have the right to say no, and they have to get rid
of it.
And for them that's a big step up from where they are today
which is not being able to do anything.
So this is the version that's being on the table in the EU.
A single set of rules throughout the EU but even outside the EU.
So if you're a company and you give subsidiary or
subcontractors some data they have to follow the EU rules and
you're liable for them.
That puts real teeth in it.
Consent is required and data portability.
So I told you about informed consent.
Data portability means that you can say to Amazon I want to have
an XML file that gives all my purchase history in a form that
is computer readable and incidentally that I can then
send to Barnes and Noble and Barnes and Noble can use.
That's data portability or your hospital record and give it to a
different hospital and they can use it.
So it gives consumer pressure to be able to clean up their act.
The right to be forgotten, to get rid of the data; unfortunate
battle in the EU over this because some people took this
too literally, companies have to retain data for crime.
You know, was this person there on the night of so and so or for
auditing.
We think you're cooking your books.
We want to go back and see.
So, there's a lot of reasons companies have to retain data
but they can take data offline so that they don't have it
normally.
Big fines.
So that's the type of thing that's being proposed both here
and in the EU that grounds out in particular things.
So one is when you start building computer systems, I'll
be happy to talk to people about this afterwards, there has to be
really trusted identity.
So how many of you have too many passwords that you can possibly
remember and curse at this at least once a day?
Okay, everybody.
So one of the ideas and in this country it's called the National
Strategy for Trusted Identity in Cyberspace and one of my guys
co-chairs that and its idea of get rid passwords, yea, and
replacing them with the sorts of things that the military uses.
The military doesn't use passwords everywhere.
There's ways of propagating secured identity without having
all those things.
A cheesy version of it is already offered by people like
Facebook.
So how many people log in using Facebook to other services
using Facebook?
It's easy right?
Now Facebook knows everything about you.
Okay. You use Google but-- and it's easy and it's very cool
except now Facebook or Google know like everything about you,
so it's got that downside.
So the idea is to come up with a national framework that's like
this where Google and Facebook don't own it.
Informed consent, we talked about that a little bit,
metadata.
So when you put data into the bank, oh yeah that's right, you
call it money, right?
When you put money into the bank they have metadata about that,
about who owns that data and what you can do with it.
Is it this sort of account or is it that sort of account?
And they get audited and the simple idea is that ought to be
something you can do with personal data.
So if you share your data with somebody in order to get better
service, you ought to be able to check automatically just by on
the computer that they're doing the right thing and that they
have penalties if they don't, automatic penalties.
And interestingly, of course, a lot of those, sort of big
computer system guys like Microsoft and so forth, think
this is great because among other things they're going to
make systems that do this, right?
And other companies already have this, so some of the big Telcos
already have this and then in this country there's a bit of a
battle about do not track.
So all this is no good if you don't know you're being tracked,
so there has to be some sort of evidence that you're being
tracked.
Anyway that's a little technical but I figured a lot of you guys
sort of have computer science.
In my lab we've done a couple of things in this area.
The main thing is that we've built something called "open
personal data store."
So this is the right to physically control, as much as
you can, physically control your data, you need a store where you
control the store, you have ownership rights to it.
And so we've developed, with support from both the government
and industry, a framework for doing this and it's part of a
sort of a global movement to really make these rights
something that are effective by having software that really does
it.
And it has a couple of different things.
I don't know how much I want to talk to you about it.
One of the main ones is that it's not just computer science,
it's also contract law.
So there's this idea that what you do is you have when you
share data there's a contract behind it and that has to be
something that gets instantiated in this computer
code.
I don't know if this is too technical.
>> No, it's good.
Keep going.
>> Professor Sandy Pentland: Okay.
Remember the phrase I think it's Willie Sutton?
Willie Sutton was a bank robber.
He kept getting arrested for robbing banks.
It was bad for Willie and they asked him, "Why do you do this?"
He said, "Because that's where the money is."
That's why he does it.
So where's the money?
Anybody want to guess today?
Suggestions?
No, I don't have it up there.
So, there's something called the SWIFT network.
If you've ever had a money transfer you may be familiar.
It's the inner bank system for transferring money.
It's three trillion dollars a day and it's never been hacked.
Now why is that?
That sounds like a pretty weird thing, isn't it?
You know all these networks being hacked, right?
I should say never been hacked that we know about.
[ Laughter ]
What it has is, it has something called the "trust network" and
what that means is that there's a contract that's peer-to-peer
between all the banks.
Remember, banks are in something like 163 different countries
many of which are nothing more than criminal cabals and yet
they're able to transfer money safely even in those sorts of
places.
So there's a contract, it's not like regulations, a contract
between the banks that says, here's what you say on your
computer network in order to make an offer of transferring
the money and here's the replies that you can make to receive the
money and here's the liability if you screw up, and
incidentally, it's a joint liability.
So I'm a bank.
You two guys are talking.
I'm going to pay attention because if you guys screw up I
might have to pay.
So everybody is watching everybody, okay?
The joint liability is part of it.
So that's what the SWIFT network does.
Visa, everybody has a Visa card.
They have a trust network too.
Oh actually, it's not a trust network to you.
It's between the banks and the Visa network, right?
So they're safe--
[ Chuckling ]
oh well, too bad, but what you can do nowadays is
you can take the same technology that these big guys developed
which took them lots of lawyers and computer programs and you
can make it consumer grade.
Now we know how to do it.
Computers are fast.
You can just make it go and so that's what we've done is we've
made a network like the SWIFT network that's for you and it's
open source and we're just trying to get people to use it
and there are all sorts of people that are beginning to use
it or thinking about it, State of Kansas, Mass.
General Hospital, Luxembourg, Andorra.
We took a military project for secure identity.
Remember those passwords called "overnight econnect" and we've
taken over, made it open source supported by MIT Industrial
Consortium, put in auditability and computer and storage and so
this is a way where companies can adopt technology and legal
framework to satisfy those regulations that are being
proposed and this is why people are interested in it as opposed
to some radical thing is that they may have to do this pretty
soon, particularly starting in the new year.
One of the cute innovations in openPDS is the following.
How many people have heard about problems in re-identifying data?
So there's all these things about anonymous data right?
If you see somebody talk about anonymous data know that they
don't know what they're talking about because there's really,
essentially no such thing.
If you take all the names out some data, then I can almost
always find another data source that lets you re-identify those
names.
That's what people found again and again.
Now that's different, there's ways of getting around that,
like in the Ivory Coast if you aggregate it, so you put it in
big piles then you really-- it's extremely difficult if not
impossible to re-identify.
But this idea of sharing data is fraught with danger because
while that data may be pretty harmless, if you combine it with
other things it may not be harmless.
I'll give you a key example is location data.
>> [Inaudible audience comment] >> Professor Sandy Pentland:
I'll give you another example.
So a key example is location data so I may want to offer you,
I'm a company.
I want to offer you some coupon and I ask you "Are you in San
Francisco?" And you return a latitude and
longitude, okay?
Well, now let's say I do that fairly often.
After a while I'm going to know where you live, where you work
and where you hang out and I'm going to know what sort of
person you are and blah, blah.
This is not a made up example.
This is how it works.
And the thing is you shared much too much data when you gave them
a precise location.
What you should have said is, "Yes I'm in San Francisco or no
I'm not in San Francisco."
And so what openPDS does is it answers questions but it doesn't
share data except when it has to and that reduces the
dimensionality of data and makes this whole privacy thing lots
easier to deal with.
It doesn't cure it, but it gets a lot better.
So that's of interest if you're into this sort of computer
stuff.
The real answer for all of this stuff though is that it's not a
policy thing.
It's not a computer sciency thing.
It's a deal with society.
It's a new deal, okay?
And the only way they're going to test it is not in a
laboratory, is not in the legislature.
You've got to build it.
Stick it out in the real world.
Let real people live it.
And for various reasons it turns out it's hard to do that in a
country like the U.S. or in fact any large country but it's easy
to do it in small countries because they're just not as
polarized and the politics are simpler and so Trento in
Italy, which you might think of as being in Italy, it actually
isn't.
It's an independent province, so we convinced them to be able to
put together a living lab where they could try openPDS, new
ways of sharing data and starting very small with young
families that have just had children because we thought they
wanted to share data a lot and some of that would be very
personal data and they also are economically stressed so sharing
spending data as an example.
And what we're doing is we're letting them live the future
with these new sorts of regulations in place to be able
to see how it works.
Because the truth is, as you know, either people understand
it and use it correctly or it's useless and in fact as bright as
we are we think we know everything, we don't and we make
mistakes and sometimes they're pretty bad mistakes and so you
have to actually put things out in the real world and test them.
It's called living labs and that's what we as a country here
need to do.
We need to declare Rochester to be a living lab.
You could try this stuff out.
Try and be able to put facts on the ground about the benefits
and the dangers of big data and about how sharing policies,
privacy policies should work.
Now that sounds sort of crazy but we're doing that not only
here, but the Denmark Technical University in my group are
giving Smartphones to every student in the student body and
personal data stores and being able trying to live in the
future.
At MIT we have in front of the president right now a proposal
from some of the leading computer science faculty to take
our system and make it something that everybody at MIT uses and
the MIT community is almost 60, 000 people, only 4,000
undergrads but a lot of hangers-on and that's what you
need to do.
You need to actually spin-up things or you can try it out.
Since this is sort of a computer sciency college, you might want
to think about that.
Could you enroll everybody, maybe opt-in of course, but
enroll everybody in the campus in a big data experiment?
Could you live in the future and see how it works?
It would be very relevant not only to make it personally
relevant, but to create experience and facts that'll
guide regulators, guide companies, alert us to dangers,
show us opportunities and sort of start the movement.
So that's it.
Thank you
[ Applause ]
>> Professor Kumar: We have time for a few questions.
Please stand up.
>> I'm actually an alumni here so I used to be a part of
College of Engineering, can you expand on the used definition
of consent and what that actually means in terms of
giving up your data and how you go about that and if we're
already doing at one point what should we do instead?
>> Professor Sandy Pentland: The short answer is not much because
it isn't specified precisely at this point.
The key thing is that there are legal definitions of what
informed consent means and then there's sort of level of
informed.
Another key item is the ability to revoke consent.
So those are the things you need to know.
The standard that's out there that I think is informing people
comes from medicine and from human subjects research where
there's international coordination among what it means
to inform human subjects for an experiment or what it means to
enroll people in a new medical procedure so there's already
sort of an international sense of this as the question of
sharpening it up so that you know exactly what it means in
the context of computers and stuff like that.
I think that's the right thing to say.
The other thing that's interesting is remember a
battle, sort of a big not exactly a fight, you get the
idea between the EU representative and the U.S.
representative about a couple of issues about how flexible can
you be about this but one thing they were in absolute agreement
with was if you were a minor, if you were less than age of
majority, it's going to be really strict.
It's going to be just like the informed consent for human
subjects which for minors is very strict.
>> What are you doing in your open data store to reduce the
transaction cost of managing your own data?
So just as a quick example, you mentioned your son and the video
game and asking to track where you are and so forth, so you get
these, conceivably you get these multiple times a day and okay,
so should I give permission or not?
And I don't even want to think about that so okay, fine so okay
yeah.
How can you reduce the transaction cost?
>> Professor Sandy Pentland: So there's a couple of things.
For instance in Trento, there's an app called Check App
which gives you sort of ratings of all the apps that you use in
terms of the danger and the base of this and that has a little
interface that lets you manage that.
But the truth is 99 percent of the people won't use that, okay?
And so what we have or what we imagine also in the future
because it's mostly imagining, in the future is that people
like AARP, like the university, etc, the people who have
some sense of being helpful or having custody will develop
their own sort of procedures or standards so what I'll get is
something that says, "Would you like to follow MIT's suggested
setting?" Sure. "Would you like to follow
the AARP suggested settings?" Yeah, okay or the Baptist Church
suggested or whatever because it's way too complicated.
Nobody could really manage it.
It's actually technically complicated enough that nobody
in this room could really tell what's going to happen.
You get into differential privacy and some of the long
range effects.
It's just hard to know.
But there is-- but that's not unusual, I mean, think about the
financial stuff.
You have financial advisors.
You have retirement advisors.
They're sort of best practice but it's not perfect and that's
what we have to do is think about this as an analogy with
financial things.
We have to try and do at least that and maybe we can do a
little better.
>> Professor Kumar: There's just time for one more question.
>> Are you personally optimistic or worried?
>> Professor Sandy Pentland: Optimistic, but then I'm a
pragmatist, right?
So what I see is, I see advantages for government, for
hospitals, for Telcos to actually do this to protect
people's rights so that they can be more free in other areas and
so they can be more productive in other areas.
And so it's in the interest of some of these big operators to
actually give you more opt-in and more control so that they
can have a free playing ground with say the aggregate data that
they don't get much traction with today.
So for big Telcos, which remember I advise Telefonica,
which is one of the biggest.
They want to be a white hat.
They want to give you the view of your data.
They absolutely want to do that but they also want to take
aggregates of that data and build businesses by it, things
that can't be traced back to individuals or you take Mass.
General Hospital.
If you can spin up things like this personal data stores which
is what we're doing, it turns out that you can really
revolutionize medical research.
Today when you do a medical experiment you map out the
experiment.
You do RRB.
You recruit people.
Now two years have gone by typically.
I'm serious, right?
If everybody has personal data that they're collecting in their
own stores then you say, well look here's the experiment I
want to do.
You say that to the RRB Board and then you flip the switch.
The data is already there.
All you're doing is a transfer of control of the data from the
person to the experimenter under informed consent.
It like makes the research cycle half to quarter the time that it
would have otherwise been.
So those are just two of the examples.
But that's why I'm optimistic.
Now, where is it not going to work?
Well, sales people are going to cheat.
We have this whole unregulated part of the Internet, the
Facebook of the world.
It's going to be a long battle to put that down, put that genie
back in the bottle.
But once the big guys are done, once the big guys show that you
can have a more respectful way to build a big data economy,
something that preserves our ability to interact with it the
Facebooks of the world will come under real pressure because
the question will be well, if Bank of America for god's sake
can do it, why can't you do it?
There's no real good answer to that one, so they're going to
have to change the way they do business but that's a long bell.
>> Professor Kumar: Thank you.
I'm afraid we don't have much time.
[ Applause ]
On behalf of the Thomas Golisano
College in appreciation for Sandy Pentland's inspiring talk
I'm giving this--