Blinding Curiosity - Supporting user preferences and privacy within workplace awareness applicastions

>> RIEFFEL: We're going to talk today about our work in supporting privacy within workplace awareness applications. And there's a very general question which is that massive amounts of information are being collected about each of us and that will only increase as sensors become cheaper and more powerful. Analysis of all this data can support many really good things from advances in medicine, to improved software and services, to more efficient economic mechanisms. But there's also potential for misuse of such data. And so a really challenging long term research question is how best to support beneficial uses while inhibiting misuse. We're going to look at that question within awareness systems. Awareness systems are an emerging class of technologies that enhance communication and connections between people in both business and social settings. They share information about a user's location, what they're doing at least to the extent of whether they're available and what communication channels might be best to reach them. So you've seen some lightweight examples. So, for example, just whether you're online or offline in an IM client is a type of awareness application. And a little more sophisticated are things like Google Latitude, Foursquare or Facebook Places that make it easy for you to share your location with your friends. We designed and deployed MyUnity, a prototype workplace awareness system. So that's what we will concentrate our talk on and we will look at how we handle some of the privacy issues that come up there. But we hope that our work in this setting generalizes to other awareness systems and maybe even to some other settings like the medical or user studies or things like that. So within awareness systems there are two main concerns; one is there's information that's being shared right at this moment and their privacy issue is connected with that. There's also issues with what data is stored longer term. We've looked at both issues and we'll talk some about addressing the moment to moment data collection and in particular how MyUnity provides user control over sensor channels. But the bulk of the talk, we'll talk about the question of storing data. Initially, we did not store any data because we were concerned about misuse. But not only did the researchers want to analyze what was going on with the system but the user started to want to analyze their own behaviors, their colleagues' behaviors, things like that. But they continue to have concerns about misuse. And so this talk will concentrate on our secured histories approach. So first, we'll give an overview of the MyUnity awareness system. And I'll be handing that part off to Jake who's led that part of the project. And he'll talk in particular about the controls that are given to users and a few results from the user studies and also an amusing issue we've had to deal with within the last week which is some popular press articles which--well, we're glad for the attention. Some of it is a little bit misleading. So I'll let him discuss that. Then I'll come back up and discuss the secured histories part and particularly our approach that lets users have more control and provides what we're calling need-to-know security where each individual has full access to her own data. A third party can help her process the data but learns nothing in the process. And an analyst can have the third party help out but will only learn pulled statistics. And we'll then present a family of protocols to achieve need-to-know security in our setting. And then we'll finish with a summary and open questions. So I'll hand it off to Jake. >> BIEHL: We just start speaking until I get to the microphone. So thanks Eleanor. What we're going to concentrate here with Unity is giving you sort of the $1 tour. There's a lot more to this system than what we're able to present in this time period. But in order to understand the privacy applications and innovations that we've done, we really need to have a baseline understanding of what Unity is, why we built it and what we've learned from using it so far. So our work with Unity has been primarily motivated by some fundamental changes we see going on in the world. The first is that the workplace is becoming much more fragmented. In fact, the United States President's Council of Economic Advisors put on an article, actually a full report, in March of this year saying that U.S. companies--one out of two U.S. companies have established formal flex work policies. And what this means is that they're breaking away or allowing their employees to break away from the Monday through Friday, 9:00 to 5:00 rigid schedule. And the report also indicates that about a quarter of the employees so far exercising these policies to work at least one day out of the office. Similarly in our own research at FXPAL, we've looked at how the use of communication tools has been evolving over time. And we've actually been collecting data on our employees for about three years in how they're using different communication tools and how they're choosing which tools to use. And what we found over the progression of time is that it's no longer just phone and e-mail. The--our toolbox of tools that we use is increasing. We're not dropping tools and we're also becoming more sophisticated in choosing which tools to use for what type of communication or with which individuals. At the same time, we're also seeing, at least within our company and what we're exposed to, an increased globalization effort. In fact, the United Nations now says globalization is an irreversible trend. It's going to be a part of our economy for times to come. At the same time, we also see still huge dependence on interpersonal communication to complete tasks. We're not working independently more now than we were before. We're still working together and we need to communicate in order to get tasks accomplished. In fact, recent studies have shown that we have about 50 to 60 interactions per day when we're at work per worker. What these accounts equates to is an interruption or an interaction with a colleague about every eight minutes. So that's a lot of interaction that we're having at work and most of these interactions are unscheduled and impromptu. So when we're distant and not working co-located, it makes these things very difficult. So all this boils down to us a central realization when we look at the future of communication tools and that we can no longer take for granted in the office place of the future, worker's location, availability and preferred communication methods and channels. And so, as we move forward into this new era, we're really going to need new tools and services that will address those challenges that I mentioned on the previous slide. And at FXPAL and our research, we believe that presence and awareness will really be a critical building block in these tools of tomorrow. So that's really motivated what we've done with Unity. And at a very high level, what Unity is it's a prototype system that's composed of three components. There's a data and sensor collection component where we have a bank of sensors that are collecting various information about users and their activities. We have some services that are in the cloud that are analyzing this data and forming higher level descriptions of people's present states. And then we have some interfaces that allow users to access this information on their desktops, laptops and mobile devices. And the system that we have has been evolving over a period of about a year and a half at FXPAL. And we've had it in continuous deployment for about a year with most of our employees and we're now beginning deployment with our peers at our parent company in Japan. So, what is it? Here's a diagram that sort of illustrates the overall system architecture. Over here on the left hand side you'll see that we have various sensors that are feeding information into the Unity system. These are showing what we have currently in our system although we are continuing to expand it. We have Bluetooth sensors that can detect which building people are in, what part of the building they're in. We have cameras scattered throughout the building, including in people's offices. These cameras look for motion to determine if people are in their offices, in their office with visitors, et cetera. We also have various external data coming in to the central server. This is things like IM status, internal and external calendar information as well as some software that we have running on our end user clients as well as in the networks to determine where people are connected from and their accessibility. I guess that all these information goes into our central server and we use some algorithms to fuse this information into higher level present states, which is then portrayed on these dashboards that run on end user clients. And what you're seeing here is the dashboard that we have for the Windows desktop as well as the Android smartphone. And what you'll notice about each of the interfaces is a tile-based display which we call the dashboard, which resembles sort of a business card metaphor where you have people's names, their photos and then you have the present state provided and then color coding to reinforce those present states. So in this figure that I'm showing here, the green people represent people that are physically in their offices. There is one person that's red down, down here. Oh, not red I'm sorry it's purple. If you're purple, it means that you're in your office with a visitor. Blue indicates that you're working off site, so you may be at home, you may be at a café working, but you're not physically on campus. A yellow color indicates that you're somewhere in the building but not in your office and orange indicates that you're out and about but are making yourself available on your phone if your colleagues need to contact you. You also see that there are additional icons and decorations on the tiles that indicate planned appointments. So either vacation time for certain individuals or appointments like giving a talk at Google. So at the very beginning when we started building Unity, we had users' privacy concerns in mind and we really focused on the data collection side of privacy initially. So, built in to Unity is a bunch of controls that allow an end user to control not how the data is being shared but what data is being collected about them. So in these dialogues, which I'm showing from the desktop app, users can go in and turn on and off these different sensors. And what this will do is it will prevent the system from ever collecting information from that particular source. And what I want to highlight here is the camera control. Although you're seeing a live camera feed of my office here at FXPAL, this view is only for me. And what it allows me to do is designate different regions of my office that belong to either areas that I inhabit as the office owner or areas that are inhabited by visitors when they come into my office. So we're exploiting the sort of static nature of how people use their offices. What comes out of this system is just a zero or one or a two plus, which is used at the higher level of understanding people's state. So like I've said, we've been studying this very carefully in our deployment at FXPAL and we've really found some really useful and interesting results. One is that we found a greater increase in the opportunities for effective collaboration. We had--we've performed one study where we actually had people carry around diaries, and every time they had a communication event they would log on their diary the form of communication and the medium that they used and who with. And over that four-week study, we found that before--compared to before Unity was in use, we saw a 17% increase in face to face communication and a 21% decrease in email conversations. And this was important to us or a positive result for us because when--before we deployed Unity, we asked users what was their preferred method of communication. And overwhelmingly for a research lab, it's face to face interaction. And so by allowing people to use their preferred method of communication and offset those methods that are less preferred, we've increased the overall effectiveness of collaboration or we believe. At the same time, we've also seen a general sense of improved awareness. So, when we asked qualitatively how people are using the system, it wasn't just for finding somebody to interact with them but it was getting an overall sense of what the behavior of the entire organization is or feeling a sense of community with one's peers. And we've got a lot of great quotes from our study on this, but the one that really hits home for me is that one of our users said that it feels like everybody's office is right next door. So they feel much more connected with their peers at work. A study that we did over the summer where we were looking at how the different clients were being used compared to the desktop and mobile client, we found a very interesting result in that 83% of our users were now consulting Unity when they're initiating contact with peers. So it's become a real staple of their day to day interactions. And what's important to note here is that for people that also have the tool on their mobile client, the percentage was much higher, so they become more and more dependent as the accessibility and the availability of the tool becomes higher. Now, we've had a lot of success with Unity and we've attracted some popular press on the system. We had an MIT technology review article published last week on the work and the article is actually very fair and very informative, but the author of the article did not set the headline of the article which is "Someone's Watching You," which is obviously a part of the work that we don't really want to emphasize but it is certainly a concern. Well, this, of course, has been picked up with meta-articles and each time it's reposted or re-reported, the system gets a little bit more extreme. And the best one lately is, you know, this one can tell if you're, you know, just talking about a system that can tell if you're slacking off at work. If you read into the details of the article, it actually says that "We're able to tell if you're using Facebook and talking with your friends or if you have slackers in your office, not visitors." So, it's certainly sensationalizing some of the concerns about tools like this. So the question here is what's really sensationalism and what's genuine concern? And there is obviously some privacy concerns of this tool. Now, the big let down for these journalists is that we've been thinking about this from day one and we've actually had a pretty aggressive research agenda to figure out how--what are the ways that we can maximize the utility of a tool like Unity while protecting the tool from being used for purposes we didn't intend or for misuse. And one of the things that we have been doing is as Eleanor introed for us at the beginning is a way to secure the history so that nobody can tell, you know, what someone was doing last week, you know, on Tuesday at 5:00 PM, but we get a general sense of their behaviors and activities. And I'll hand it over to her. >> RIEFFEL: As I said earlier, MyUnity didn't store any data. But researchers like to analyze what goes on with the systems they deploy. And even more important, the users of MyUnity started expressing interest in seeing personal trends for themselves, analyzing their own behavior or activity patterns of co-workers or long term data pooled across groups of users, for example, when is the best time to contact somebody in the support group. So when our users ask for this, as well as having interest ourselves in doing the analysis, and actually this was prior to my joining the project. This is sort of what got Jake talking to me. So, he said, "Okay. We want to support this. How can we support this?" And users--while users wanted these things, they continued to be concerned about the misuse of stored data. So the research challenge was how to support the user's desires while respecting their concerns. So just to give you an example of the sort of feature we wanted to support, here's a graphic that gives a summary data for a user average to over a month for the five present states. Colors are similar to what we had seen in--on the little tiles. And you can also do similar summaries over a group of users. So, I just wanted to give you a picture of, "Okay, this is something we'd like to be able to do and show that to appropriate people, but without them being able to dig in to individual--more details about individual data." So, just to set up the terminology, I'd like to say who the players are here. So, there are a bunch of users of the system, and each user may use a number of clients. So, they may have a desktop computer, a laptop or two, bunch of mobile devices and they can all run MyUnity. Okay. There's a non-trusted third party server who can store data for us, probably encrypted, process things. We trust the server to carry out the calculations we ask, but we don't trust the server with our data. We're concerned that the server might be accessed by somebody who's curious about what's going on in the organization in a way that we did--and we don't want to reveal that information. The analyst may be users of the system but want to look at their own behavior or group behavior or they may be separate, for example, researchers looking at how this system's being used. So, what we wanted to do with our secured histories approach was to give users a lot of control about who sees their data. So, we want them to maintain control of their own data but still be able, if they want, to contribute their data to group's statistics. So, current practice is that analysts are generally given access to all of the data in order to compute statistics. So, the researchers are trusted with all the user data. This is true in our setting, in medical settings, the like. And more and more often, researchers are using a third party to at least store the data and sometimes even process it. So either the data is not encrypted, in which case a third party has access to it or the data is encrypted and the third party can't do any--can't help out with the processing. So what we want to do is provide mechanisms to support what we're calling, "need to know" security. This is where an individual has full access to her own data, the third party can help her process it, but can learn nothing about the data values in the process. And the analyst learns only the desired statistics, not the individual values. So what we want to do is store the present states. We're not going to store the raw sensor values, though that's an interesting question in and of itself. So we're going to store one bit for each of the five positive presence states like in office or with visitor. And it's time series data. At each time, say, each minute, a user contributes a present state. So, five bits of information. And then we want to support data averaging. So, we want to able to support arbitrary sums over our single user's data. So, a user can ask any question she likes about her data and have the third party provide sums that will help with that computation. And analysts can obtain arbitrary sums as long as it's pooled over all the users. And just as on the side, while averaging's very useful and our current protocols just support sums, we do have extensions that are fairly easy to do that also compute variance and higher moments so we can get a little bit more sense of the distributions. So, we had some design criteria when we started this. So, since the presence states are computed frequently, the input needs to remain quick. Okay. And on the other hand, the analysis of stored data doesn't occur as frequently and is generally not needed on a moment-to-moment basis. So, that can be a little slower. Furthermore, the computation over stored data should not require involvement of the clients. That sounds like sort of an obvious requirement, but most approaches prior to this, to having someone learn the statistics without learning the individual values require that the clients all participate in that computation. And so for us, in some sense, this was the one that presents the greatest challenge. How can we do this without asking the clients, without burdening them at this stage? And furthermore, the bandwidth between the client and the third party and the analyst may be limited, especially in the mobile cases. So we didn't want to be sending a whole of lot of data back and forth. So, here's the functionality we provide. All the data is stored in encrypted form on a non-trusted, dishonest but curious third party. Each user can request encrypted sums over arbitrary subsets of her own data from the third party. And the data doesn't learn anything--the third party, rather, doesn't learn anything about the data values because everything remains encrypted. The individual can then decrypt using her own key or keys this single summary statistics that's returned. So that's how we save a lot of bandwidth and also processing. So even though each user encrypts... Oh, and then for the sums over pooled data, a particularly interesting feature here is that even though each user has encrypted under her own key, we can still pull the encryptions into sums that can be decrypted. Initially--well, I'll tell you what the--we have a variety of protocols that do certain things with different keys, but the important things are that in doing this, the third party can compute without needing access to any of the keys, without decrypting any of the data and without further interacting with the--of the individuals. So the third party does this all and, again, doesn't learn anything; it doesn't learn anything about the pooled statistics either, but does learn about--but does process the information so that they can hand something off to an analyst who has some keys who can learn the resulting statistic. And finally, the--we wanted the protocols to be fairly easy to implement and integrate with MyUnity system. And so, one feature of this is that we were able to put together basically off the shelf components that you'll find in, like, Java Library, .Net Library and a few things that it was fairly easy for us to write on our own in order to implement this functionality. So here are some of the ingredients. What we want to have is an additive homomorphic encryption scheme. I'll explain what homomorphic encryption schemes are in just a little bit. But the key property is that this is what enables a third party to compute the encrypted sum. And we're going to use a symmetric key homomorphic encryption scheme. Most homomorphic encryption schemes are public key encryption schemes, but the symmetric key gives particularly compact encryption, which is good for the bandwidth, and also, it has this key feature that it makes it really easy to compute sums over data encrypted under different keys. So in the simplest protocol the keys for all the values contributing to the sum are needed to decrypt so the analyst can decrypt. If the analyst has--is able to decrypt the sum, the analyst can also decrypt the individual values. And that also provides some security because the third party hasn't been able to learn anything in the process. So we'll first present that protocol and then we'll show how we can do some variations on that to give more security where the analyst can decrypt the statistic but cannot decrypt any in--the individual values. So in order to achieve that next step to the--get to the point where it's full need to know security where the analyst can only decrypt the group statistic, not the individual values, what we do is extend Chaum's Dining Cryptographer problem-based networks to this--to obtain a more complex key structure. And the family of protocols will show is secure against collusion by K users. So, in order to decrypt your value, K other users would have to get together and pool their resources in order to decrypt your value. So, homomorphic encryption simply in--it enables you to do computation on the ciphertext, on what's been encrypted and then decrypt afterwards. So, an additive homomorphic encryption scheme, you can encrypt a message, encrypt another message. And then if you add those two messages together, that's an encryption of the sum of those two messages. Most encryption schemes are--do not have this property. So it's a very special type of encryption. There are also multiplicative encryption schemes. Again, if you encrypt a message and you encrypt another message and then you multiply the two ciphertexts together, that gives you a ciphertext that you can decrypt to get the product of those things. What's interesting is that it's hard to do both. So, algebraic or fully homomorphic encryption scheme allows you to do arbitrary combinations of both addition and multiplication. And that's highly desired because that lets you do general computation. However, prior to 2009, no such scheme was known. Okay. All homomorphic encryption schemes were only partially homomorphic. In fact, almost all of them could only compute sums or could only compute products, but not both. And there were a few recent advances that allowed you to compute arbitrary sums and a few products like second degree polynomials, like that. So, last year's big result was Craig Gentry's fully homomorphic encryption scheme. So, that was a very exciting result. And from a complexity theory point of view, it doesn't look, like, that bad because it's just a constant slow down. However, for those of us interested in practical schemes, the constant is a bit big. So if you want to do computation on the encrypted data, it's a trillion times slower than if you wanted to do it on unencrypted data. It's a little tough to use at the--at present. So there's--that was a big breakthrough. There have been some follow on papers and maybe, eventually, we'll--we will get there. But for the moment, we have to stick with schemes that are a little less powerful but will do what we--what we want. So, most of the remaining schemes are based on public key cryptography which is quite a bit less efficient than symmetric key encryption. So we were excited to see that Castellucia had recently developed a symmetric homomorphic encryption scheme for using in wireless sensor networks. And a probably nice feature of the scheme is it's--is the basic idea behind it is very simple. So in preparing this talk, I sort of debated how much should I go into the technical details here and then I thought, "Well, this is one of the beautiful areas where the technical--the technical details are actually relatively easy to communicate." So, I want to give it a try. So, the idea between the symmetric homomorphic encryption scheme is Alice has a message M1 and she--suppose she generates a random value R1, okay? Then, she encrypts by adding those two values together. And because R1 is random, the ciphertext is randomly distributed. So someone can decrypt if and only if they know this piece of randomness, this R1. Similarly, Bob has a message or a piece of randomness and encrypts in an analogous way, then they can add these two things together. And basically, what you have is the message--the two messages added together plus the two pieces of randomness added together. So, anyone who knows those two pieces of randomness can decrypt, otherwise, it appears random. So, the biggest problem with that simple view of the scheme--oh yes, question? >> This is on [INDISTINCT] of power of two. Are there arithmetic system [INDISTINCT] on power of two or overall integers? >> RIEFFEL: It's a mod--a large factor. Yes. >> A large factor, but is power of two a prime factor or some specific factor? >> RIEFFEL: The--you can--you can choose it. Just large, yes. So you want to choose it large enough so that when you're--when you're adding these things up you don't overflow. But other than that, you can choose whatever you like. So if a power of two is convenient, choose a power of two. If it's not convenient, choose something else. Yes. Other questions? So, the problem with that really simple view is that if Alice stores encrypted values this way and she then wants to decrypt, she has to remember what all of those pieces of randomness are. So, she probably has to store them because they're random things, right? Have you remember a whole sequenced random things. Well, storing them means she's storing a whole bunch of stuff and, also, that's not very secure. She's storing all these paths that will be used to decrypt. So what we want is a predictable and efficiently computable source of something that's like randomness. So pseudorandomness means that it's indistinguishable from random unless you know a secret such as a key. In this clip, what indistinguishable means can be formalized. So what pseudorandom function families do is they provide the capability Alice needs here. So, she--if she has a key, that's used as an index for this function family. So she gets a function F sub KA, which is her pseudorandom function. And she can then encrypt by she takes her message and she adds to it the--her pseudo--her pseudorandom function evaluated at a nonce, which is just a non-repeating value or rarely repeating value. So, for example, a good thing to take here would be the timestamp and the presence type that she's encrypting. So at this particular time, so when she wants to decrypt, she knows what time and what type of presence state she can then compute this pad here and subtract it all from the ciphertext and get her message back, her value back. So as long as her pseudorandom function is truly indistinguishable from random, her message is safe from anyone who doesn't know the key. So we're getting to where we can talk about these protocols. So, the one on this slide is one which enables partial need to know security but where the analyst to decrypt needs to know all the keys used to encrypt. So, as usual, you need to choose security parameter, a large modulus. We use HMAC as a pseudorandom function family and we have length matching hash function H. And for key generation, each team member generates her own key. And then to encrypt, that team member uses her pseudorandom function that's indexed by that key to encrypt as we saw it just by adding what we're calling the pad, this piece of pseudorandomness. And to compute a sum, you just simply add the ciphertext. And to decrypt, anyone who has access to all of the keys can decrypt by computing these pads for the nonces, the timestamp and the presence type of the values contributing to the sum and subtracting them off from the encrypted sum. So the issue here is that the analyst--if the analyst can decrypt the sum, the analyst can decrypt all the individual values. Another issue is that decryption requires the computation of a lot of pads. So more complex key structure that we'll present next completely solves the first problem and reduces the second. So it's nice that by making the protocol more secure, we also make it faster, but that doesn't always happen. And, again, the idea behind this is not too hard. So suppose each user had a set of its own random values and then all of those values plus R zero, which we're going to give to the analyst, add up to zero. Okay. Then if the users encrypt using these pieces of randomness, then--so by adding these pieces of randomness to their values. Then when you sum all those values, you're going to get the sum of their values plus the sum of all these pieces of randomness. Now, if the analyst adds R zero to that because of this property here, all of the bit pieces of randomness disappear and you're just left with the sum of the values. And so in this way, the analyst can--just by knowing R zero, can decrypt the sum, but R zero does not enable the analyst to decrypt any of the individual values. So that's the idea. Then trying to figure out how to actually implement something like this was a little trickier. So the best thing I was able to come up with early on was that the users come up with keys and share them in a certain way that gives a similar property. So the analyst generates a key K zero. Each user generates her own key, and then the analyst and the user share keys as follows. So, a user shares her key with her neighbor with the index one higher, okay? So, the analyst shares her key with the first user. Similarly, the last user will share her key with the analyst. There's sort of a ring here, okay? And then to encrypt, again, we're just adding pads but it's a little bit more complicated. As before, we add the pad corresponding--a user adds the pad corresponding to her own key and then she subtracts off the pad corresponding to the key which she received from her neighbor. And the point of doing this is that when you sum all those ciphertexts, almost all of the pads cancel out. And the two that don't cancel out are the first and last ones, which are the ones that the analyst has. So, in the end, you get the sum of all the ciphertexts is the sum of all the messages plus the first pad minus the last pad. And since the analyst has these two keys by how we did this sharing in the first place, the analyst can compute those pads and can obtain the sum of all the statistics. However, the analyst cannot decrypt any of the values making up that sum because she does not have any of the keys for the individual values. So, that's our--that's our first fully need to know protocol. And then, I'm not going to go through this in detail, but the same idea can be generalized to get a little bit more security. So, if you look at--if you look at this protocol, when a user encrypts her own value, she's using a key that she has shared with one neighbor and a key that she has received from the other neighbor. So if the two neighbors on either side collude, they can decrypt her value. So we want to make it a little bit more secure. And so, we were also able to come up with a whole family of protocols where you needed to have K users collude instead of just two. And basically, if you have a graph structure in which the analyst and all the team members have K plus one neighbors, you can do a similar key sharing scheme and a similar encryption where you add some of the pads, subtract off others of the pads and then when you add them all together, the pads cancel except for the ones that the analyst has the--has the keys for. And so in this way, we get a family of protocols that are secure against K collusion users. Okay. One comment is that this key structure that we came up with is a little bit more complicated than this original idea of finding random numbers--finding a source of random numbers that add up to zero. And I didn't see how to do that then I asked that as an open question. And, Elaine, she, at Park, thought about it for a while and came up with a way to get a series of random numbers that have this property at each time step. And so we have a joint paper coming out with that view. It has some advantages and some disadvantages. It's secure against any sorts of collusion up to--I mean, nothing can be secure against everybody colluding because of the sums available. Then if everybody says what their value is, your value is not secure. But beyond that, it's very elegant in this respect and doesn't have the K collusion issue. On the other hand, because the decryption's much harder, it's restricted to a small plain text space. So, anyway, there's--I believe there's a lot more work that can be done both on need to know protocols that do other things than just sums, for example, other ways of obtaining these things. I did want to mention that in related work as I had mentioned before, most approaches to this problem use multiparty computations where the clients are involved in each stage. Another piece of related work that you'll hear in this space is differential privacy. And it addresses an orthogonal issue. It addresses the question of if you obtain a statistic, what can you learn just from that statistic about the individual values. And sometimes the answer is quite a bit. And so one really interesting question is can you combine our need to know protocols with differential privacy? And in a standard differential privacy setup, there's an assumption of a trusted third party who calculates the statistics and then adds noise in a way that makes it hard to figure out what's going on with the individual values. But they saw everything. And what--and what we're able to do is combine differential privacy with the need to know protocols to obtain both the need to know property and differential privacy without the need for a trusted third party. There was a previous paper by Rastogi and Nath that use secure multiparty computation techniques to achieve differentially private aggregation without a third party. But, again, that requires clients participating at each point. Finally, there's a certain amount of work in the wireless sensor network setting that builds on Castelluccia's work as well. But other than that, we believe we're the first application of their symmetric homomorphic encryption scheme outside of wireless sensor networks. So, to wrap up, this is a really exciting area of how do you support privacy and utility with all this data collection. And we hear very frequently the cynical view that privacy is dead and that there's nothing we can do about it except to mourn its death. And that may be fun to write about, but we think it's actually more fun, and certainly much more challenging, to ask "Well, what can we do here that's useful?" And one area to look at is just having a better understanding of what user preferences are with respect to what do they want to learn from the data and what are their concerns about the data. And then there's all sorts of interesting questions about finding mechanisms to encourage benefits and reduce harm. And these can be cryptographic, like the one I presented today, or they can be economic or social. Admittedly, the challenges are big. That's probably why the "privacy is dead" slogan gets passed around. On the other hand, that means that there's lots of room for many people to work towards interesting solutions. To wrap up, we'd like to thank particularly Bill van Melle and Adam Lee who worked with us on the secured history part of this work and the entire MyUnity Team; Bill, Thea Turner, Pernilla Qvarfordt, and Tony Dunnigan and others. And we'd particularly like to thank all the MyUnity users and study participants for their help with all this work. And if you're interested in learning more, please talk with us. And here are also some references to look up, some of the work we've discussed. So, thank you again for having us here. Really enjoyed it. >> BIEHL: Does anyone have more questions? >> I'm going to talk a little bit more about the HMAC, how you use it because you kind of need for every series or whatever it is you're going to have permits on scope of "C" to generate it to the round of numbers that will end up to zero. So I guess that it should be opposite from your sequence number or a timestamp or how do we use that? >> RIEFFEL: So let me see if I understood your question well enough. But we use it, HMAC, as a keyed pseudorandom function... >> Yes. >> RIEFFEL: ...family. So, I make up my own key and that gives me a pseudorandom function family, basically, a version of HMAC. >> Yes. >> RIEFFEL: And then, in our particular case, we're interested in encrypting each of these five bits of a present state at a given time. So what the nonces that the--that version of HMAC is evaluated on is the timestamp together with the present state. >> Okay. >> RIEFFEL: Okay? >> So then timestamp is public knowledge, right? >> RIEFFEL: Yes. >> So then you could see that it's between on the ciphertext, right? >> RIEFFEL: Yes. Yes. So that's all public knowledge. The only secrets are the keys. >> Yes. >> RIEFFEL: So those have to be guarded very carefully. But--yes? >> Also, if these are--to think about it like on the fly? >> RIEFFEL: Absolutely. >> Okay. This is my understanding. If you abbreviate instead of over all your dataset, just over two people, one of which you already know the key because of your key sharing mechanism, right? >> RIEFFEL: Right. >> And then if you add or remove the key zero or what have you, which the analyst trusts, right, whichever end up to zero, this should be about the same as if you abbreviated over all users where all user's data was always zero, right? >> RIEFFEL: Okay. I didn't quite follow that last part. Sorry. >> Okay. So, how the--K zero is the... >> RIEFFEL: Okay. >> ...key the analyst trusts, right? >> RIEFFEL: Yes. >> Okay. It has--it just basically might know of the sum of all the keys that other users have, right? >> RIEFFEL: No. >> To some extent. >> RIEFFEL: It's the index into a function that... >> But the--under instant generate that number, right? >> RIEFFEL: Right. Yes. >> Yes. That's the important part. And this number is used to--okay, it's composed of the sum of all the keys used by the other participants to some extent, right? Because they haven't--these algebraic relationships didn't add up to zero. >> RIEFFEL: It has that algebraic relationship. You don't use that algebraic relationship to get it. But, yes. >> Yes. Yes. >> RIEFFEL: Yes. Yes. Yes. >> But it has, up to a certain extent? >> RIEFFEL: Yes. Yes. >> Oh, perfect. So any ciphertext would be one part of this algebraic expression and plus the number--the secret [INDISTINCT], right? >> RIEFFEL: Right. >> So, basically, is zero is a pre-run in to this algebraic expression as simply values or as the abbreviated ciphertext if all linked extra square of zeros, right? >> RIEFFEL: So not the key but the function value in--the function--the pseudorandom function... >> Yes. >> RIEFFEL: ...evaluated at that time... >> Yes. >> RIEFFEL: ...for the--for that key. >> Yes, exactly. >> RIEFFEL: Yes. Yes. >> So key for the encryption and you'll derive that key from HMAC... >> RIEFFEL: Right. >> ...function. >> Okay. I get it. So what... >> RIEFFEL: We all those pads, by the way. >> Pads. Okay. Yes. >> RIEFFEL: If that might help. Yes. >> Okay. It does. So, basically, Analyst E at any given timestamp is exactly the same as aggregating all... >> RIEFFEL: The other ones. >> ...zeros length [INDISTINCT] this with each user's pad, right? >> RIEFFEL: Correct. >> And it has already one of the--of the end users secret piece, right? >> REIFFEL: Right. >> So if you can derive one of the end user's [INDISTINCT]? >> REIFFEL: One of the pads, but not one of the values because two pads were used to encrypt each. >> Yes. >> REIFFEL: Yes. Okay. >> Okay. So what happens if the analyst abbreviates just the site or text from one of the user for which he knows the key, right, and some other user? >> REIFFEL: Yes. >> And then uses keys pad, which is K 0 or something like that, to decide--to just add, say, four [INDISTINCT] or something like that. Then that operation would be as keys of the pad he already knows minus the secret number of the user keys abbreviated with... >> REIFFEL: Yes. >> ...minus all the other users with their plain text set to zero, right? >> REIFFEL: Okay. >> Positive index set to zero. Wouldn't that provide information about the user who are abbreviating with? >> REIFFEL: So, no, because there's still--there's still a random number--the pseudorandom number. So, each user is encrypting in this way, right? >> Yes. >> REIFFEL: So, say, it's the first user and the first user is encrypting with, as you say, K0 and K1. >> Yes. >> REIFFEL: So as you say, the analyst may know K0 and can compute this path, right? >> Yes. >> REIFFEL: But the analyst can't get any information about this pad. >> But this pad--we have to think about all this since we have time constraints. >> Yes. We might have to go to a blackboard, but. >> Okay. I think probably someone have some more questions. >> REIFFEL: Okay. Anyone else? Okay. Well, thanks.