Gccis Dean's Lecture Series - Dr. alex pentland

>> Professor Kumar: Thank you for coming to the lecture and welcome to the B. Thomas Golisano College of Computing and Information Sciences as well as the Dean's Lecture Series and this is the first of the lectures for this year. And I'm Mohan Kumar, the Chair of the Department of Computer Science in the Golisano College and it is my pleasure and privilege today to introduce our speaker Professor Sandy Pentland of MIT. The goal of this lecture series which was started about ten years ago is to bring leading minds from academia, from industry and from the government to RIT campus and share their ideas with the students and faculty at RIT. And today's speaker is the 48th speaker in the series so we have two more to go to celebrate. So at this time I would also like to acknowledge our professional interpreter Kate for translating the talk to the audience and again it is my pleasure and privilege to introduce Sandy Pentland. Sandy is a globally recognized authority on big data and he's the director of the MIT Human Dynamics Program and the MIT Media Lab Entrepreneurship Program. In addition, Sandy is the lead academic for the World Economic Forum's Big Data and Personal Data initiatives and he was chosen by Forbes as one of the world's seven most powerful data scientists. So in today's lecture titled "Social Physics and the Beginning of Big Data Society" Sandy will discuss the impact of big data and the need for a new deal on data to preserve privacy and personal safety. So sit tight. I've heard Sandy talk before and it will be a great talk I can assure you that. Thank you. >> Professor Sandy Pentland: Thank you. [ Applause ] Good turn out here. This is great and a beautiful day. So thank you. So what I wanted to do today is talk about sort of my view about big data and the things that I think are the most important and the things that are misunderstood. So big data is this buzz phrase. It's everywhere. Everywhere is big data and you see these definitions like it's volume, velocity. Well, yes and no, actually if you look at video, video is bigger, online video, bigger, faster than almost anything else and it's been around for decades in very many formats, so what I think is actually different is not so much the bigness of it as the depth of it. So now we have data that's very detailed data about all sorts of things but most importantly about people. It used to be you could live your life in relative obscurity from the point of view of the digital webs that surround us, but today as you move around you leave bread crumbs everywhere and that's the thing that's really changed, from your cell phone, from your credit cards, from driving down the highway and that's huge data. It's continuous data and that's really the difference, so I want to talk about that today, the good things about it and also the bad things about it. But first of all I want to introduce myself a little bit, maybe put a little humor into things, so this is me back in 1993. This one here. Because, what occurred to me was people were beginning to talk about computers everywhere. I mean you have to remember that PCs and things like that were barely a decade old and they were actually getting out into society and the Internet was beginning to grow and people talked about pervasive computing, computing everywhere and I said, well you know actually if computers are going to get that small and that powerful that they're going to get built into our cuff links and our glasses things like that long before they get built into the walls because we don't renovate the walls very much. So what I did was I created the first cyborg collective. So you've got Star Trek, right? And the Borgs? We had the Borgs. We had it about the same time as they did, maybe a little before. These are some of the Borgs and back then to simulate what it would be like to live today we had to have the students run around with these motorcycle batteries, this big, right? Ten pounds, and PC 104 PC that was hot enough that you didn't want to touch it and these little displays that are actually vibrating mirrors so that you could see stuff but of course today that turned into Google Glass and all the watches that people are beginning to do but the nice thing is is that guy in front there, that's Thad Starner, so he got into this enough that he wore that sort of thing for the next 20 years until Sergey Brin said, "Okay, guy, make it real." Now he's the technical lead for Google Glass. So I like to describe myself as the grandfather of Google Glass. I didn't wear it for all those years but we helped cause the problem, shall we say. I do a number of other things too. As was mentioned I run, co-lead the discussion around big data and personal data which is privacy and security, at the World Economic Forums. Have you ever wondered what it was like to go to Davos? That's what it looks like. It doesn't look like too much. A hotel room with a bunch of people in business suits and stuff but they're pretty amazing people. I mean the Vice President of the EU is in there. There's a special representative from the National Security Council there's the CEO of Visa. It's sort of an interesting crowd and my role is to make sure that they talk and not fight and then I do a number of other things too. I'm on the Advisory Board of Motorola Mobility which you might remember is owned by Google now and one of the biggest sort of client, mobile client of Google is Samsung so they don't want to fight with Samsung and what that means is that Motorola Mobility has to reinvent what it means to be a cell phone. You remember I did that wearable computer stuff years ago? Coming soon to a store near you. I also am helping to run the first autonomous, commercially accessible autonomous vehicle in the world produced by Nissan and I advise Telephonics on what they ought to do, start companies; do stuff like that. Forbes likes me. So that's what I do. So let's talk about big data. I wanted to tell you that to give you a sense of where I come from because this is my view on big data. It's not the canonical truth. There is no canonical truth at this point so when you think about big data what everybody thinks about is Google and Flickr and Facebook and those are certainly important and familiar but that's actually not where the action is. I think that while those are things to be concerned about and think about the real action is around the fact that wireless is finally here. So what this diagram shows is the distribution of wireless devices in the world. That's all the yellow stuff. The blue stuff are the big undersea cables that carry the bits around but the thing that's the most striking change in our world is not Google, is not Flickr. It's the fact that virtually every adult in the human race has a sensor package that knows where they are, knows who their friends are, knows all sorts of stuff about them and they have the ability to give and send digital data, messages. Everybody is connected now. It's amazing. I mean you go to the most remote places in Africa and you find that the average phone ownership is 90 percent of the adult citizens. That's typical, in many places the average phone ownership, in the poorest countries, the average phone ownership is like 1.2 per person and the reason is that the phone companies are fighting and then there are also things like you have the phone that you use with your wife's family and the phone you use with lots of interesting cultural variations but what that means is that you can do things that you've never dreamed were conceivable but what we think about the world is really different. Let me give you a couple of examples that are from my students and myself, so this is a little start-up company. What it does is it sends coupons to people to buy shampoo and so forth. That is not a terribly interesting thing but the guy that started this started with sort of a small pot of money and ended up with a customer list of 3.48 billion people in under a year. Well, how did he do that? Well, he went to seven people who happen to run some of the seven largest phone companies and said, "Can I use your network to reach people for consumer products? And we'll pay SMS fees and go back and forth and we'll give them top up minutes but we'll use your network." And of course they said, "Sure, we're going to make money off this." But think about that, to be able to reach, a start-up without a lot of money, to be able to reach the majority of humans in the world in under a year. That's just completely unheard of. It takes companies decades to build up a fraction of that size. I don't want to say this is the world's best business or anything like that but the ability to scale like that with an existing infrastructure is incredible. And Moore's Law tells us that all of those phones that are out there are going to be Smartphones in just a couple of years. Already, almost everywhere in the world every village, every small businessman has a smartphone. Not everybody has a smartphone. Pretty soon it's going to be that way and that means another huge change in communication. Here's another thing that a student and I did in India. There are these things called TopUp minutes. How many people know what TopUp minutes are? Yeah, so you can buy, in most part of the world, minutes to put in your phone. You give them some money. They put more minutes on your phone so you can call things and that's rather than having the normal plan that's a way you get communications. Well those minutes are a little bit like cash, aren't they? So what they did is they set up a little exchange so if you had minutes on one network you could set up trade in for minutes on another network and they're essentially made of virtual cash. And what we had done is set up a network of over a hundred thousand retailers, very small retailers. The only thing they had really in common is six square foot with six square foot, like 600, right, a store and a phone and what you can do now is you can take those TopUp minutes and you can pay that little retailer in the middle of nowhere to buy whatever it is you need and you can even take cash out. You can give him some minutes he'll give you some cash. Suddenly you have this alternative economy that lets you buy things, sell things, trade things. Hundreds of millions of people and again, you can spin it off almost immediately. Not that it's not a lot of hard work but that's a really different way of doing business and you also have to remember that almost all these people are on unbanked. They've never really been part of the formal economy before. It's really transformative. Here is another spin out and this is one that Chief Technical Officer of the United States describes as a real game changer and if you think about it some people laugh at this. It's a check engine light. Your car has a check engine light, right? When something's going wrong it says "Check Engine." Do you have a Check Engine light? Well, you ought to. Think about it, so in the health care system today we are out of money. We don't do a very good job of delivering health care but there's another aspect which is that there are lots of estimates that say that most visits, the vast majority of visits to the doctor are at the wrong time. They're the worried well. They're the well, I'm sorry sir but you know if you live long enough you may get out of the office sort of visit. It's not at the right time. What you would like to do is you'd like to know when somebody should get help precisely, but what happens when you're beginning to get sick? Well, actually, what happens is your behavior changes. You don't go out as much. You don't call the same people. You don't look-- we have a friend that's getting sick you say, "You don't look like you're doing so well." What is it that you're picking up on? You're picking up on these behavior changes. You can do the same thing through sensing of the phone. A little Check Engine light goes on. It says, "You're not acting like yourself. Something's wrong. Maybe you ought to find out." And what that does is that gets people to the doctor in time to do something and it also tends to reduce the worried well because after all my check engine light isn't on. Another example of this, another spin out is something that measures mental health. If you look at the criteria for depression, diagnosis of depression, PTSD, schizophrenia one of three things come up again and again and again. One is you change how you socialize with people. Different diseases are slightly different but people's social life changes dramatically. Their activity level changes. For some diseases like depression you tend to just cocoon away and not do anything. For others you'll get frantic and you're everywhere but it's a major change and the third thing is it disrupts your daily habits. You become irregular in your habits. Well guess what? You can measure all those things off of the sensors in your phone. So it's not quite the check engine light because that's sort of a general thing. This is something that if you're under doctor care can be used to have continuous monitoring with the doctor and that's actually how it's used. So the point in telling you these things is to sort of break your mindset about what's possible. There's this infrastructure out there of sensors and connectivity that makes it possible to reach almost everyone in the world almost immediately and do things that only your best friend could do before and that's astounding. That's the thing that's driving big data. Now, what I've shown is all things that are actually pretty familiar but you can go a lot farther than this, so let me show you another thing. So this is something I did a few years ago. So these are people moving around in San Francisco and these big dots, well those are like stores and restaurants and things where people go quite frequently. It looks like a nice city but if you analyze this, what you find is you find that the city is really made up of separate populations, subgroups that really don't mix with each other at all. They walk right by each other on the street but you know there's the group that likes the edgy bars and the avant-garde music and there's the conservative people and then there's this crowd and that crowd and they don't actually mix that much and if you watch where they choose to go, using GPS and things like that, you can pick out the places that are for group one versus group two versus group three. So you can begin to stratify the population. Now this is just like demographics. You don't need to know who these people are. In fact this was done, the one I'm showing you, was done off of taxi cab data. Where do taxis pick up? Where do they drop off? Well it turns out different types of people go to different places. They choose to go to different places. And of course you can do it in lots of ways, but once you know where these clusters are you can go talk to some of the people in the cluster. I can go talk to these guys here, right? What do these guys have in common? You can't guess from the map unless you know San Francisco really, really well. Those guys are hard partiers. They really are. They have almost an order of magnitude, more likelihood of, let me see I did this backwards, of getting alcohol poisoning. And there's another group that has a factor of almost five greater likelihood of getting diabetes. Now diabetes is an enormous cost to our society. It comes from a lot of things. It's not like one cause. It's a whole bunch of things but it's really interesting that there are places where prediabetics go and places where prediabetics don't go. We don't really know why but once you know where they go you'll know where to set up a "Get Yourself Screened" center, right? More intelligible the way I had this organized was if you know people's choices about where they spend time you know a lot about their preferences. So everyone talks about Facebook they post all this stuff. Well the stuff you post on Facebook is the stuff you want other people to know and you edit it to project the face that you want to do, right? Nobody actually tells the truth on Facebook. It's not human nature but where you spend time is a real commitment. That's who you are. And so by looking at where people spend time you can tell a huge amount about them so obviously what their preferences are, what they chose, things that aren't obvious from location but you know, the people that all go to these sorts of places tend to dress the same. It's really interesting. It's a very strong thing and so marketers look at that and say, "Oh, those are the guys that buy the leather pants. Okay. And these people over here those are the women that buy the red dresses. Now I know where to advertise. Now I know where to put up a store." If you put all this stuff together what you get is you get a sense of the rhythm of the city, so you'll know that if you're in the middle of a weekday that there are people doing these sort of activities there and they tend to like these sort of products. There are other people doing other things in other places and that it changes at night with different people going to different places. So you can begin to design a city to reflect the patterns of the people. Who's getting nervous? >> I've been for a while. >> Professor Sandy Pentland: Oh, okay you've been for a while? Good. You should be getting nervous. You can use this in lots of ways so let me give you an example of something that we did recently. So I helped talk Orange, which is a large carrier in Europe and Africa into releasing the data they had about phone call usage in the Ivory Coast. We also got the UN to release all the data they have and the World Economic Forum to release all the data they have and we made this big Data Commons and it's probably, arguably the first in the world that shows all these sorts of things about all the people in the Ivory Coast. Now the people in the Ivory Coast are a little special. First of all they're very poor. Second of all they just finished a very violent civil war. So there's a lot of places that for instance the government can't go because they get shot. That's the way it is with civil wars. Now this data we did a lot of interesting things to it. We aggregated it around cell towers so you can't see individual people and we chopped it up so you can't track things for long periods of time only for a week or two at a time. And what that means is that it's very, very hard to understand particular people but you can see average patterns really well. And so what we did with this data is we released it to some 90 groups around the world, research groups to be able to see what can you do for the Ivory Coast to help the Ivory Coast? I'll show you some examples. One group in Dublin looked at their transportation. I know you can't understand this. I can't either, looked at the transportation network and it turns out that using this sort of data you can tell where people start in the morning, where they work or make money and when they go back. So you know what the transportation they'd like to have. Now the subtle thing is that the way transportation is done everywhere in the world is somebody stands there or they put out those little rubber things and they count the number of cars going by or number of people going by but they don't know where those people started from or where they're going. They don't know where they would like to go as a transportation network and so if you know this thing which is called an origin destination matrix you can design a transportation system that's a whole lot better because you can build. Put the buses that go from places where people live to the place where they work. They never knew that before. When they mapped that on to the existing bus system they discovered that small changes, very small changes would reduce the commute time by 10 percent. Ten percent is big when you do it across a huge city in terms of pollution, in terms of energy cost, in terms of wear and tear on the people. Another sort of thing is the public health systems, so infectious disease depends on people interacting. And if you're going to set up interventions, you know, telling people to stay home or telling people to wash their hands or giving people inoculations you've got to know where the people are. But they never knew where the people were. In fact, we don't know where the people are in this country. We don't know where people interact and that means we can't set up a public health system that's really effective. So using this sort of data they were able to finally map the places where people got the flu. They were also able to track back and find places where people got diseases like malaria which you don't have to worry about here but it's a big problem there and estimates are they'll get about a 20 percent reduction in flu propagation the next time they have a flu season which should be about now. Pretty good, more interestingly, another group was able to do something else which is shown here. In the civil war they had the battle lines ended up around here and so it was thought that this was a north/south division because really nobody knew where the ethnic groups were. These were battles between different ethnic groups but it turns out that's not the case. It turns out you can use this data about mobility and communication to figure out who talks to who and of course if you all speak the same language you tend to speak to each other more than speaking to other people and similarly you visit your relatives more than you visit random other people. So you can map out ethnic division pretty accurately and what they discovered is the ethnic divisions are not north/south the way they thought they were. They're vertical which means you can now begin taking much more effective action in trying to defuse tensions at the places where the ethnic groups come together. This map's even more interesting. So they don't have any data about poverty in the northern part of the country, right? Because if you go up there you get shot. It's hard to collect data when you're dead so but there is a technique that you can do with this data that gives you a very accurate sense of what's called multifactor poverty. So it turns out that level of disposable income, child mortality, crime rate, life expectancy all co-vary. One goes up the others go up. One goes down the others go down. So you just put them together in one factor MPI. It turns out that when people are feeling more comfortable they explore more so they move around to more diverse places. They call more diverse people and when they're feeling threatened in various ways they do less of that and you can average that over regions to get a very accurate estimate of this MPI. So that's pretty amazing if you think about it. So that means that I can look at the average cell phone tracker and tell you how many babies are dying, literally. I'm not making this up. I can do it really pretty well and I can do it today. I don't have to send people out in the field. There's none of this one census every ten years sort of stuff or ask all the doctors, what doctors? It's the Ivory Coast. What are you talking about? Finding out from people, you know the stuff. You can actually look at things like poverty conditions. You can look at things like the spread of infectious disease and if you can look at it you can begin doing things because among other things you can have a conversation with the government that says, "Well you say that you've helped this area a whole lot but look. It doesn't look that way. What's going on?" Until you know that you can't have that conversation. So what I wanted to do in this sort of first half of the talk here is give you a sense that first of all big data is data about people primarily. Yes, there's the Internet of things and coordination but the real thing that's happening here is is more Internet data about people and that's a scary thing because if you can tell all that about people who owns it? Who oversees it? So recently there's been a lot of news about the National Security Agency. Everybody hears about that, right? And the National Security Agency says, "We're just looking at metadata," right? What did I just show you? Everything single thing I showed you was metadata, not a thing wasn't metadata, I don't think. Nothing was not metadata. So that's what you can do with all that data. Now is that a good thing or a bad thing? Well I'm not going to debate that but I'll tell you this is a really interesting and new capability and it's here now. And the NSA is the famous example but the other thing that's happening of course is that companies are using this, sometimes little companies, sometimes unethical companies. I have a little game I play on my cell phone because my son plays it. It's sort of interesting but what's really interesting is when they update software because they send you a little note and by looking at the note, it's always in broken English. You can tell that this was written by someone whose native language is Chinese. And then you look at the permissions on the thing and it says we want to know where you are and who you call. Why does a video game want to know where I am and who I call, especially given that the data then goes back to China? Yeah, exactly, let's think about that one. So it matters who owns this data and who controls it. Now what typically happens next is people say, "Privacy. Now we've got to worry about privacy. Shut it all down." Because that's been sort of the tradition but I hope you will remember the last ten, fifteen minutes that I can tell you where the babies are dying. I can tell you who's going to get flu and die in a pandemic. I can tell you how to design cities to be more energy efficient. You want to lock it all down you give up all those public goods. It's real simple. You tell me how many babies you want to die? Seriously, I'll tell you how to lock it down. Okay? There's has to be a better solution and the end of the solution comes as follows. So what I'm going to give you is sort of a summary and a little bit of history of the discussion at Davos and then what some of the people who were part of this, so that includes people like the Justice Commissioner of the EU, the head of the Federal Trade Commission, folks of that sort. What's happened and then where things are going it's particularly interesting because this is a computer science crowd. I know not everybody is computer science but you ought to be interested because it's your stuff. So 1950, this is the way systems in society worked. You showed up in the bank. You talked probably to somebody you knew because you lived there for quite a while and you gave them your signature, physical signature. They compared it to the other signature and that's how you certified your identity and how you got services and privacy was not a big deal because the worst that would happen is the teller would go tell somebody else in your church group or something like that but it certainly wasn't a go to faceless bureaucrats and wherever because there was no-- it was too expensive to copy all that stuff. And then we got electronics. We got IBM 360s. We got CCTVs. We got fax machines and in the '60s people got scared because suddenly the old traditions that made data local, make it personal were breaking down and our modern notions of privacy were born in legislation that said, okay guys we're going to lock things down. This is not okay and I mean I know the people that actually wrote that legislation. They're still alive and kicking and the idea was okay we have this way of doing things and we're going to make the electronic way of doing things match the physical one. Nothing else allowed. Well that sounds good but what's happened over the years is that people have realized that this data is really valuable and they've found all the little corners where you can sneak the data out and sort of under the table and so now what we have is this grey market. We have people spying on us. We have things happening that we don't always know about because this legislation was sort of like just lock it down. They didn't envision that people would find all these ways around it. So we've got a problem. So what are we going to do about it? Well, the core is that computer systems have to in some sense be compatible with these sort of more human systems. Our expectations about what happens up here have to match our expectations that we grow up with and as humans we tend to expect. I don't mean this in a sort of haughty way. We have certain things that are in our biology about what we can think, about what we can't think sort of capacity limitations. We have expectations about social relationships, about causality and basically we have to be able to know what's happening up here in the same way we knew what was happening here. That's the thing that you want to go for, I would claim and the key to it is to put this data into a framework that's understandable and the key thing here is to think about it and say so what's happened is this data is now very valuable. That means it's an asset to somebody. Now we know about assets. We have money. We think about that just fine, right? We get confused and make mistakes but we deal with money pretty well. We have property, right? We own land. We can sell that. You know, understand how that works, right? Around the edges it's a little ragged but maybe we could do the same thing with data. Well, but you can make copies of data. Well, I've got some news for you. You know the things in your bank that you think are your money? It's really just ones and zeros. But there's a system on top of it that makes sure they can't just arbitrarily copy your data and take it from you and stuff like that and the system is actually one that's based on very old principles. These principles first of all, they are actually in the United Kingdom in Britain, and they were ownership rights, the right to possess, dispose and control. Now it's not the same as ownership. It's rights, not full ownership and what that means is that some of the dispute resolution is different sort of a technical legal thing. Keep those in mind. And so back in 2007 I proposed what I called The New Deal on data and helped start this discussion within the Forum and other places and the vision is this. Data is an asset and the first thing you have to do with data therefore, with this asset is decide who controls it? Whose asset is it? Does it belong to the government? Let's see a show of hands. Who thinks it belongs to the government? Good. Some countries you would get more hands up. Who thinks it belongs to things like the Telcos and the banks? Who thinks that you should have control of it? Yeah, okay. So instant experiment, the only politically viable solution is individuals control data about themselves nothing else is going make it as a sort of stable political statement. It's not a statement of principles. It's not a statement of mathematical facts. It's a statement of politics, so in these meetings you have senior politicians, you have companies and you have people like me who cause trouble. And the idea is you need somebody that's a win for the politicians. They want to get re-elected. The companies have to be able to make money and the citizenry has to be protected and get value from it. They need the win, win, win. And the solution that got hammered out basically gives people much more control over their right, over their data about them. And I'll explain this in a little bit and the system that goes with it. The ideas have been codified into the EU Data Protection Acts and they stem from the Human Rights parts of the EU Constitution that give people literal ownership of data that is about them. In this country it's in the U.S. Consumer Privacy Bill of Rights which is not yet enacted but also in the regulatory framework that's been put forward and is being acted on by the Federal Trade Commission. And the idea is roughly that whenever people collect data about you they have to give you informed consent. So how many people are familiar with informed consent? So you do this whenever you go take a medical procedure or participate in a human subjects experiment they have to tell you, "This is the data we're going to collect. This is what we're going to do with it. This is the risk that you have and this is the benefit you're going to have and you can opt out at any time and we will destroy your data." Those are the things that you have to do as we do human subject experiments. And that's the thing that people are willing to do in these new regulatory frameworks. Now you might ask, okay I see why the government would want to do that but why would companies want to do that? That's the sort of first thing and the answer is that most of the companies at the table don't trade data yet. They're regulated industries. So the data I showed you those are Telcos. Telcos don't actually trade data because they're licensees of the government. They're regulated. If they get caught with their hand in the cookie jar it's bad for them to say nothing of what their clients, their customers are going to do. I mean if Verizon really screws up everyone is going to go to AT&T. So Verizon doesn't want to do anything that's really bad for those reasons, same thing with banks, same thing with hospitals. But if in fact they give you these sort of rights, really inform you of what's happening then what this regulation says is they can go ahead and do it. So for instance, they can say look, we'd like to take this data about you and do the following thing for it and we'll pay you in this way with better services or more money and you have the right to say yes or no. And if you decide later having said yes that you don't like it anymore, you have the right to say no, and they have to get rid of it. And for them that's a big step up from where they are today which is not being able to do anything. So this is the version that's being on the table in the EU. A single set of rules throughout the EU but even outside the EU. So if you're a company and you give subsidiary or subcontractors some data they have to follow the EU rules and you're liable for them. That puts real teeth in it. Consent is required and data portability. So I told you about informed consent. Data portability means that you can say to Amazon I want to have an XML file that gives all my purchase history in a form that is computer readable and incidentally that I can then send to Barnes and Noble and Barnes and Noble can use. That's data portability or your hospital record and give it to a different hospital and they can use it. So it gives consumer pressure to be able to clean up their act. The right to be forgotten, to get rid of the data; unfortunate battle in the EU over this because some people took this too literally, companies have to retain data for crime. You know, was this person there on the night of so and so or for auditing. We think you're cooking your books. We want to go back and see. So, there's a lot of reasons companies have to retain data but they can take data offline so that they don't have it normally. Big fines. So that's the type of thing that's being proposed both here and in the EU that grounds out in particular things. So one is when you start building computer systems, I'll be happy to talk to people about this afterwards, there has to be really trusted identity. So how many of you have too many passwords that you can possibly remember and curse at this at least once a day? Okay, everybody. So one of the ideas and in this country it's called the National Strategy for Trusted Identity in Cyberspace and one of my guys co-chairs that and its idea of get rid passwords, yea, and replacing them with the sorts of things that the military uses. The military doesn't use passwords everywhere. There's ways of propagating secured identity without having all those things. A cheesy version of it is already offered by people like Facebook. So how many people log in using Facebook to other services using Facebook? It's easy right? Now Facebook knows everything about you. Okay. You use Google but-- and it's easy and it's very cool except now Facebook or Google know like everything about you, so it's got that downside. So the idea is to come up with a national framework that's like this where Google and Facebook don't own it. Informed consent, we talked about that a little bit, metadata. So when you put data into the bank, oh yeah that's right, you call it money, right? When you put money into the bank they have metadata about that, about who owns that data and what you can do with it. Is it this sort of account or is it that sort of account? And they get audited and the simple idea is that ought to be something you can do with personal data. So if you share your data with somebody in order to get better service, you ought to be able to check automatically just by on the computer that they're doing the right thing and that they have penalties if they don't, automatic penalties. And interestingly, of course, a lot of those, sort of big computer system guys like Microsoft and so forth, think this is great because among other things they're going to make systems that do this, right? And other companies already have this, so some of the big Telcos already have this and then in this country there's a bit of a battle about do not track. So all this is no good if you don't know you're being tracked, so there has to be some sort of evidence that you're being tracked. Anyway that's a little technical but I figured a lot of you guys sort of have computer science. In my lab we've done a couple of things in this area. The main thing is that we've built something called "open personal data store." So this is the right to physically control, as much as you can, physically control your data, you need a store where you control the store, you have ownership rights to it. And so we've developed, with support from both the government and industry, a framework for doing this and it's part of a sort of a global movement to really make these rights something that are effective by having software that really does it. And it has a couple of different things. I don't know how much I want to talk to you about it. One of the main ones is that it's not just computer science, it's also contract law. So there's this idea that what you do is you have when you share data there's a contract behind it and that has to be something that gets instantiated in this computer code. I don't know if this is too technical. >> No, it's good. Keep going. >> Professor Sandy Pentland: Okay. Remember the phrase I think it's Willie Sutton? Willie Sutton was a bank robber. He kept getting arrested for robbing banks. It was bad for Willie and they asked him, "Why do you do this?" He said, "Because that's where the money is." That's why he does it. So where's the money? Anybody want to guess today? Suggestions? No, I don't have it up there. So, there's something called the SWIFT network. If you've ever had a money transfer you may be familiar. It's the inner bank system for transferring money. It's three trillion dollars a day and it's never been hacked. Now why is that? That sounds like a pretty weird thing, isn't it? You know all these networks being hacked, right? I should say never been hacked that we know about. [ Laughter ] What it has is, it has something called the "trust network" and what that means is that there's a contract that's peer-to-peer between all the banks. Remember, banks are in something like 163 different countries many of which are nothing more than criminal cabals and yet they're able to transfer money safely even in those sorts of places. So there's a contract, it's not like regulations, a contract between the banks that says, here's what you say on your computer network in order to make an offer of transferring the money and here's the replies that you can make to receive the money and here's the liability if you screw up, and incidentally, it's a joint liability. So I'm a bank. You two guys are talking. I'm going to pay attention because if you guys screw up I might have to pay. So everybody is watching everybody, okay? The joint liability is part of it. So that's what the SWIFT network does. Visa, everybody has a Visa card. They have a trust network too. Oh actually, it's not a trust network to you. It's between the banks and the Visa network, right? So they're safe-- [ Chuckling ] oh well, too bad, but what you can do nowadays is you can take the same technology that these big guys developed which took them lots of lawyers and computer programs and you can make it consumer grade. Now we know how to do it. Computers are fast. You can just make it go and so that's what we've done is we've made a network like the SWIFT network that's for you and it's open source and we're just trying to get people to use it and there are all sorts of people that are beginning to use it or thinking about it, State of Kansas, Mass. General Hospital, Luxembourg, Andorra. We took a military project for secure identity. Remember those passwords called "overnight econnect" and we've taken over, made it open source supported by MIT Industrial Consortium, put in auditability and computer and storage and so this is a way where companies can adopt technology and legal framework to satisfy those regulations that are being proposed and this is why people are interested in it as opposed to some radical thing is that they may have to do this pretty soon, particularly starting in the new year. One of the cute innovations in openPDS is the following. How many people have heard about problems in re-identifying data? So there's all these things about anonymous data right? If you see somebody talk about anonymous data know that they don't know what they're talking about because there's really, essentially no such thing. If you take all the names out some data, then I can almost always find another data source that lets you re-identify those names. That's what people found again and again. Now that's different, there's ways of getting around that, like in the Ivory Coast if you aggregate it, so you put it in big piles then you really-- it's extremely difficult if not impossible to re-identify. But this idea of sharing data is fraught with danger because while that data may be pretty harmless, if you combine it with other things it may not be harmless. I'll give you a key example is location data. >> [Inaudible audience comment] >> Professor Sandy Pentland: I'll give you another example. So a key example is location data so I may want to offer you, I'm a company. I want to offer you some coupon and I ask you "Are you in San Francisco?" And you return a latitude and longitude, okay? Well, now let's say I do that fairly often. After a while I'm going to know where you live, where you work and where you hang out and I'm going to know what sort of person you are and blah, blah. This is not a made up example. This is how it works. And the thing is you shared much too much data when you gave them a precise location. What you should have said is, "Yes I'm in San Francisco or no I'm not in San Francisco." And so what openPDS does is it answers questions but it doesn't share data except when it has to and that reduces the dimensionality of data and makes this whole privacy thing lots easier to deal with. It doesn't cure it, but it gets a lot better. So that's of interest if you're into this sort of computer stuff. The real answer for all of this stuff though is that it's not a policy thing. It's not a computer sciency thing. It's a deal with society. It's a new deal, okay? And the only way they're going to test it is not in a laboratory, is not in the legislature. You've got to build it. Stick it out in the real world. Let real people live it. And for various reasons it turns out it's hard to do that in a country like the U.S. or in fact any large country but it's easy to do it in small countries because they're just not as polarized and the politics are simpler and so Trento in Italy, which you might think of as being in Italy, it actually isn't. It's an independent province, so we convinced them to be able to put together a living lab where they could try openPDS, new ways of sharing data and starting very small with young families that have just had children because we thought they wanted to share data a lot and some of that would be very personal data and they also are economically stressed so sharing spending data as an example. And what we're doing is we're letting them live the future with these new sorts of regulations in place to be able to see how it works. Because the truth is, as you know, either people understand it and use it correctly or it's useless and in fact as bright as we are we think we know everything, we don't and we make mistakes and sometimes they're pretty bad mistakes and so you have to actually put things out in the real world and test them. It's called living labs and that's what we as a country here need to do. We need to declare Rochester to be a living lab. You could try this stuff out. Try and be able to put facts on the ground about the benefits and the dangers of big data and about how sharing policies, privacy policies should work. Now that sounds sort of crazy but we're doing that not only here, but the Denmark Technical University in my group are giving Smartphones to every student in the student body and personal data stores and being able trying to live in the future. At MIT we have in front of the president right now a proposal from some of the leading computer science faculty to take our system and make it something that everybody at MIT uses and the MIT community is almost 60, 000 people, only 4,000 undergrads but a lot of hangers-on and that's what you need to do. You need to actually spin-up things or you can try it out. Since this is sort of a computer sciency college, you might want to think about that. Could you enroll everybody, maybe opt-in of course, but enroll everybody in the campus in a big data experiment? Could you live in the future and see how it works? It would be very relevant not only to make it personally relevant, but to create experience and facts that'll guide regulators, guide companies, alert us to dangers, show us opportunities and sort of start the movement. So that's it. Thank you [ Applause ] >> Professor Kumar: We have time for a few questions. Please stand up. >> I'm actually an alumni here so I used to be a part of College of Engineering, can you expand on the used definition of consent and what that actually means in terms of giving up your data and how you go about that and if we're already doing at one point what should we do instead? >> Professor Sandy Pentland: The short answer is not much because it isn't specified precisely at this point. The key thing is that there are legal definitions of what informed consent means and then there's sort of level of informed. Another key item is the ability to revoke consent. So those are the things you need to know. The standard that's out there that I think is informing people comes from medicine and from human subjects research where there's international coordination among what it means to inform human subjects for an experiment or what it means to enroll people in a new medical procedure so there's already sort of an international sense of this as the question of sharpening it up so that you know exactly what it means in the context of computers and stuff like that. I think that's the right thing to say. The other thing that's interesting is remember a battle, sort of a big not exactly a fight, you get the idea between the EU representative and the U.S. representative about a couple of issues about how flexible can you be about this but one thing they were in absolute agreement with was if you were a minor, if you were less than age of majority, it's going to be really strict. It's going to be just like the informed consent for human subjects which for minors is very strict. >> What are you doing in your open data store to reduce the transaction cost of managing your own data? So just as a quick example, you mentioned your son and the video game and asking to track where you are and so forth, so you get these, conceivably you get these multiple times a day and okay, so should I give permission or not? And I don't even want to think about that so okay, fine so okay yeah. How can you reduce the transaction cost? >> Professor Sandy Pentland: So there's a couple of things. For instance in Trento, there's an app called Check App which gives you sort of ratings of all the apps that you use in terms of the danger and the base of this and that has a little interface that lets you manage that. But the truth is 99 percent of the people won't use that, okay? And so what we have or what we imagine also in the future because it's mostly imagining, in the future is that people like AARP, like the university, etc, the people who have some sense of being helpful or having custody will develop their own sort of procedures or standards so what I'll get is something that says, "Would you like to follow MIT's suggested setting?" Sure. "Would you like to follow the AARP suggested settings?" Yeah, okay or the Baptist Church suggested or whatever because it's way too complicated. Nobody could really manage it. It's actually technically complicated enough that nobody in this room could really tell what's going to happen. You get into differential privacy and some of the long range effects. It's just hard to know. But there is-- but that's not unusual, I mean, think about the financial stuff. You have financial advisors. You have retirement advisors. They're sort of best practice but it's not perfect and that's what we have to do is think about this as an analogy with financial things. We have to try and do at least that and maybe we can do a little better. >> Professor Kumar: There's just time for one more question. >> Are you personally optimistic or worried? >> Professor Sandy Pentland: Optimistic, but then I'm a pragmatist, right? So what I see is, I see advantages for government, for hospitals, for Telcos to actually do this to protect people's rights so that they can be more free in other areas and so they can be more productive in other areas. And so it's in the interest of some of these big operators to actually give you more opt-in and more control so that they can have a free playing ground with say the aggregate data that they don't get much traction with today. So for big Telcos, which remember I advise Telefonica, which is one of the biggest. They want to be a white hat. They want to give you the view of your data. They absolutely want to do that but they also want to take aggregates of that data and build businesses by it, things that can't be traced back to individuals or you take Mass. General Hospital. If you can spin up things like this personal data stores which is what we're doing, it turns out that you can really revolutionize medical research. Today when you do a medical experiment you map out the experiment. You do RRB. You recruit people. Now two years have gone by typically. I'm serious, right? If everybody has personal data that they're collecting in their own stores then you say, well look here's the experiment I want to do. You say that to the RRB Board and then you flip the switch. The data is already there. All you're doing is a transfer of control of the data from the person to the experimenter under informed consent. It like makes the research cycle half to quarter the time that it would have otherwise been. So those are just two of the examples. But that's why I'm optimistic. Now, where is it not going to work? Well, sales people are going to cheat. We have this whole unregulated part of the Internet, the Facebook of the world. It's going to be a long battle to put that down, put that genie back in the bottle. But once the big guys are done, once the big guys show that you can have a more respectful way to build a big data economy, something that preserves our ability to interact with it the Facebooks of the world will come under real pressure because the question will be well, if Bank of America for god's sake can do it, why can't you do it? There's no real good answer to that one, so they're going to have to change the way they do business but that's a long bell. >> Professor Kumar: Thank you. I'm afraid we don't have much time. [ Applause ] On behalf of the Thomas Golisano College in appreciation for Sandy Pentland's inspiring talk I'm giving this--