Tip:
Highlight text to annotate it
X
>>
RIEFFEL: We're going to talk today about our work in supporting privacy within workplace
awareness applications. And there's a very general question which is that massive amounts
of information are being collected about each of us and that will only increase as sensors
become cheaper and more powerful. Analysis of all this data can support many really good
things from advances in medicine, to improved software and services, to more efficient economic
mechanisms. But there's also potential for misuse of such data. And so a really challenging
long term research question is how best to support beneficial uses while inhibiting misuse.
We're going to look at that question within awareness systems. Awareness systems are an
emerging class of technologies that enhance communication and connections between people
in both business and social settings. They share information about a user's location,
what they're doing at least to the extent of whether they're available and what communication
channels might be best to reach them. So you've seen some lightweight examples. So, for example,
just whether you're online or offline in an IM client is a type of awareness application.
And a little more sophisticated are things like Google Latitude, Foursquare or Facebook
Places that make it easy for you to share your location with your friends. We designed
and deployed MyUnity, a prototype workplace awareness system. So that's what we will concentrate
our talk on and we will look at how we handle some of the privacy issues that come up there.
But we hope that our work in this setting generalizes to other awareness systems and
maybe even to some other settings like the medical or user studies or things like that.
So within awareness systems there are two main concerns; one is there's information
that's being shared right at this moment and their privacy issue is connected with that.
There's also issues with what data is stored longer term. We've looked at both issues and we'll talk
some about addressing the moment to moment data collection and in particular how MyUnity
provides user control over sensor channels. But the bulk of the talk, we'll talk about
the question of storing data. Initially, we did not store any data because we were concerned
about misuse. But not only did the researchers want to analyze what was going on with the
system but the user started to want to analyze their own behaviors, their colleagues' behaviors,
things like that. But they continue to have concerns about misuse. And so this talk will
concentrate on our secured histories approach. So first, we'll give an overview of the MyUnity
awareness system. And I'll be handing that part off to Jake who's led that part of the
project. And he'll talk in particular about the controls that are given to users and a
few results from the user studies and also an amusing issue we've had to deal with within
the last week which is some popular press articles which--well, we're glad for the attention.
Some of it is a little bit misleading. So I'll let him discuss that. Then I'll come
back up and discuss the secured histories part and particularly our approach that lets
users have more control and provides what we're calling need-to-know security where
each individual has full access to her own data. A third party can help her process the
data but learns nothing in the process. And an analyst can have the third party help out
but will only learn pulled statistics. And we'll then present a family of protocols to
achieve need-to-know security in our setting. And then we'll finish with a summary and open
questions. So I'll hand it off to Jake. >> BIEHL: We just start speaking until I get
to the microphone. So thanks Eleanor. What we're going to concentrate here with Unity
is giving you sort of the $1 tour. There's a lot more to this system than what we're
able to present in this time period. But in order to understand the privacy applications
and innovations that we've done, we really need to have a baseline understanding of what
Unity is, why we built it and what we've learned from using it so far. So our work with Unity
has been primarily motivated by some fundamental changes we see going on in the world. The
first is that the workplace is becoming much more fragmented. In fact, the United States
President's Council of Economic Advisors put on an article, actually a full report, in
March of this year saying that U.S. companies--one out of two U.S. companies have established
formal flex work policies. And what this means is that they're breaking away or allowing
their employees to break away from the Monday through Friday, 9:00 to 5:00 rigid schedule.
And the report also indicates that about a quarter of the employees so far exercising
these policies to work at least one day out of the office. Similarly in our own research
at FXPAL, we've looked at how the use of communication tools has been evolving over time. And we've
actually been collecting data on our employees for about three years in how they're using
different communication tools and how they're choosing which tools to use. And what we found
over the progression of time is that it's no longer just phone and e-mail. The--our
toolbox of tools that we use is increasing. We're not dropping tools and we're also becoming
more sophisticated in choosing which tools to use for what type of communication or with
which individuals. At the same time, we're also seeing, at least within our company and
what we're exposed to, an increased globalization effort. In fact, the United Nations now says
globalization is an irreversible trend. It's going to be a part of our economy for times
to come. At the same time, we also see still huge dependence on interpersonal communication
to complete tasks. We're not working independently more now than we were before. We're still
working together and we need to communicate in order to get tasks accomplished. In fact,
recent studies have shown that we have about 50 to 60 interactions per day when we're at
work per worker. What these accounts equates to is an interruption or an interaction with
a colleague about every eight minutes. So that's a lot of interaction that we're having
at work and most of these interactions are unscheduled and impromptu. So when we're distant
and not working co-located, it makes these things very difficult. So all this boils down
to us a central realization when we look at the future of communication tools and that
we can no longer take for granted in the office place of the future, worker's location, availability
and preferred communication methods and channels. And so, as we move forward into this new era,
we're really going to need new tools and services that will address those challenges that I
mentioned on the previous slide. And at FXPAL and our research, we believe that presence
and awareness will really be a critical building block in these tools of tomorrow. So that's
really motivated what we've done with Unity. And at a very high level, what Unity is it's
a prototype system that's composed of three components. There's a data and sensor collection
component where we have a bank of sensors that are collecting various information about
users and their activities. We have some services that are in the cloud that are analyzing this
data and forming higher level descriptions of people's present states. And then we have
some interfaces that allow users to access this information on their desktops, laptops
and mobile devices. And the system that we have has been evolving over a period of about
a year and a half at FXPAL. And we've had it in continuous deployment for about a year
with most of our employees and we're now beginning deployment with our peers at our parent company
in Japan. So, what is it? Here's a diagram that sort of illustrates the overall system
architecture. Over here on the left hand side you'll see that we have various sensors that
are feeding information into the Unity system. These are showing what we have currently in
our system although we are continuing to expand it. We have Bluetooth sensors that can detect
which building people are in, what part of the building they're in. We have cameras scattered
throughout the building, including in people's offices. These cameras look for motion to
determine if people are in their offices, in their office with visitors, et cetera.
We also have various external data coming in to the central server. This is things like
IM status, internal and external calendar information as well as some software that
we have running on our end user clients as well as in the networks to determine where
people are connected from and their accessibility. I guess that all these information goes into
our central server and we use some algorithms to fuse this information into higher level
present states, which is then portrayed on these dashboards that run on end user clients.
And what you're seeing here is the dashboard that we have for the Windows desktop as well
as the Android smartphone. And what you'll notice about each of the interfaces is a tile-based
display which we call the dashboard, which resembles sort of a business card metaphor
where you have people's names, their photos and then you have the present state provided
and then color coding to reinforce those present states. So in this figure that I'm showing
here, the green people represent people that are physically in their offices. There is
one person that's red down, down here. Oh, not red I'm sorry it's purple. If you're purple,
it means that you're in your office with a visitor. Blue indicates that you're working
off site, so you may be at home, you may be at a café working, but you're not physically
on campus. A yellow color indicates that you're somewhere in the building but not in your
office and orange indicates that you're out and about but are making yourself available
on your phone if your colleagues need to contact you. You also see that there are additional
icons and decorations on the tiles that indicate planned appointments. So either vacation time
for certain individuals or appointments like giving a talk at Google. So at the very beginning
when we started building Unity, we had users' privacy concerns in mind and we really focused
on the data collection side of privacy initially. So, built in to Unity is a bunch of controls
that allow an end user to control not how the data is being shared but what data is
being collected about them. So in these dialogues, which I'm showing from the desktop app, users
can go in and turn on and off these different sensors. And what this will do is it will
prevent the system from ever collecting information from that particular source. And what I want
to highlight here is the camera control. Although you're seeing a live camera feed of my office
here at FXPAL, this view is only for me. And what it allows me to do is designate different
regions of my office that belong to either areas that I inhabit as the office owner or
areas that are inhabited by visitors when they come into my office. So we're exploiting
the sort of static nature of how people use their offices. What comes out of this system
is just a zero or one or a two plus, which is used at the higher level of understanding
people's state. So like I've said, we've been studying this very carefully in our deployment
at FXPAL and we've really found some really useful and interesting results. One is that
we found a greater increase in the opportunities for effective collaboration. We had--we've
performed one study where we actually had people carry around diaries, and every time
they had a communication event they would log on their diary the form of communication
and the medium that they used and who with. And over that four-week study, we found that
before--compared to before Unity was in use, we saw a 17% increase in face to face communication
and a 21% decrease in email conversations. And this was important to us or a positive
result for us because when--before we deployed Unity, we asked users what was their preferred
method of communication. And overwhelmingly for a research lab, it's face to face interaction.
And so by allowing people to use their preferred method of communication and offset those methods
that are less preferred, we've increased the overall effectiveness of collaboration or
we believe. At the same time, we've also seen a general sense of improved awareness. So,
when we asked qualitatively how people are using the system, it wasn't just for finding
somebody to interact with them but it was getting an overall sense of what the behavior
of the entire organization is or feeling a sense of community with one's peers. And we've
got a lot of great quotes from our study on this, but the one that really hits home for
me is that one of our users said that it feels like everybody's office is right next door.
So they feel much more connected with their peers at work. A study that we did over the
summer where we were looking at how the different clients were being used compared to the desktop
and mobile client, we found a very interesting result in that 83% of our users were now consulting
Unity when they're initiating contact with peers. So it's become a real staple of their
day to day interactions. And what's important to note here is that for people that also
have the tool on their mobile client, the percentage was much higher, so they become
more and more dependent as the accessibility and the availability of the tool becomes higher.
Now, we've had a lot of success with Unity and we've attracted some popular press on
the system. We had an MIT technology review article published last week on the work and
the article is actually very fair and very informative, but the author of the article
did not set the headline of the article which is "Someone's Watching You," which is obviously
a part of the work that we don't really want to emphasize but it is certainly a concern.
Well, this, of course, has been picked up with meta-articles and each time it's reposted
or re-reported, the system gets a little bit more extreme. And the best one lately is,
you know, this one can tell if you're, you know, just talking about a system that can
tell if you're slacking off at work. If you read into the details of the article, it actually
says that "We're able to tell if you're using Facebook and talking with your friends or
if you have slackers in your office, not visitors." So, it's certainly sensationalizing some of
the concerns about tools like this. So the question here is what's really sensationalism
and what's genuine concern? And there is obviously some privacy concerns of this tool. Now, the
big let down for these journalists is that we've been thinking about this from day one
and we've actually had a pretty aggressive research agenda to figure out how--what are
the ways that we can maximize the utility of a tool like Unity while protecting the
tool from being used for purposes we didn't intend or for misuse. And one of the things
that we have been doing is as Eleanor introed for us at the beginning is a way to secure
the history so that nobody can tell, you know, what someone was doing last week, you know,
on Tuesday at 5:00 PM, but we get a general sense of their behaviors and activities. And
I'll hand it over to her. >> RIEFFEL: As I said earlier, MyUnity didn't
store any data. But researchers like to analyze what goes on with the systems they deploy.
And even more important, the users of MyUnity started expressing interest in seeing personal
trends for themselves, analyzing their own behavior or activity patterns of co-workers
or long term data pooled across groups of users, for example, when is the best time
to contact somebody in the support group. So when our users ask for this, as well as
having interest ourselves in doing the analysis, and actually this was prior to my joining
the project. This is sort of what got Jake talking to me. So, he said, "Okay. We want
to support this. How can we support this?" And users--while users wanted these things,
they continued to be concerned about the misuse of stored data. So the research challenge
was how to support the user's desires while respecting their concerns. So just to give
you an example of the sort of feature we wanted to support, here's a graphic that gives a
summary data for a user average to over a month for the five present states. Colors
are similar to what we had seen in--on the little tiles. And you can also do similar
summaries over a group of users. So, I just wanted to give you a picture of, "Okay, this
is something we'd like to be able to do and show that to appropriate people, but without
them being able to dig in to individual--more details about individual data." So, just to
set up the terminology, I'd like to say who the players are here. So, there are a bunch
of users of the system, and each user may use a number of clients. So, they may have
a desktop computer, a laptop or two, bunch of mobile devices and they can all run MyUnity.
Okay. There's a non-trusted third party server who can store data for us, probably encrypted,
process things. We trust the server to carry out the calculations we ask, but we don't
trust the server with our data. We're concerned that the server might be accessed by somebody
who's curious about what's going on in the organization in a way that we did--and we
don't want to reveal that information. The analyst may be users of the system but want
to look at their own behavior or group behavior or they may be separate, for example, researchers
looking at how this system's being used. So, what we wanted to do with our secured histories
approach was to give users a lot of control about who sees their data. So, we want them
to maintain control of their own data but still be able, if they want, to contribute
their data to group's statistics. So, current practice is that analysts are generally given
access to all of the data in order to compute statistics. So, the researchers are trusted
with all the user data. This is true in our setting, in medical settings, the like. And
more and more often, researchers are using a third party to at least store the data and
sometimes even process it. So either the data is not encrypted, in which case a third party
has access to it or the data is encrypted and the third party can't do any--can't help
out with the processing. So what we want to do is provide mechanisms to support what we're
calling, "need to know" security. This is where an individual has full access to her
own data, the third party can help her process it, but can learn nothing about the data values
in the process. And the analyst learns only the desired statistics, not the individual
values. So what we want to do is store the present states. We're not going to store the
raw sensor values, though that's an interesting question in and of itself. So we're going
to store one bit for each of the five positive presence states like in office or with visitor.
And it's time series data. At each time, say, each minute, a user contributes a present
state. So, five bits of information. And then we want to support data averaging. So, we
want to able to support arbitrary sums over our single user's data. So, a user can ask
any question she likes about her data and have the third party provide sums that will
help with that computation. And analysts can obtain arbitrary sums as long as it's pooled
over all the users. And just as on the side, while averaging's very useful and our current
protocols just support sums, we do have extensions that are fairly easy to do that also compute
variance and higher moments so we can get a little bit more sense of the distributions.
So, we had some design criteria when we started this. So, since the presence states are computed
frequently, the input needs to remain quick. Okay. And on the other hand, the analysis
of stored data doesn't occur as frequently and is generally not needed on a moment-to-moment
basis. So, that can be a little slower. Furthermore, the computation over stored data should not
require involvement of the clients. That sounds like sort of an obvious requirement, but most
approaches prior to this, to having someone learn the statistics without learning the
individual values require that the clients all participate in that computation. And so
for us, in some sense, this was the one that presents the greatest challenge. How can we
do this without asking the clients, without burdening them at this stage? And furthermore,
the bandwidth between the client and the third party and the analyst may be limited, especially
in the mobile cases. So we didn't want to be sending a whole of lot of data back and
forth. So, here's the functionality we provide. All the data is stored in encrypted form on
a non-trusted, dishonest but curious third party. Each user can request encrypted sums
over arbitrary subsets of her own data from the third party. And the data doesn't learn
anything--the third party, rather, doesn't learn anything about the data values because
everything remains encrypted. The individual can then decrypt using her own key or keys
this single summary statistics that's returned. So that's how we save a lot of bandwidth and
also processing. So even though each user encrypts... Oh, and then for the sums over
pooled data, a particularly interesting feature here is that even though each user has encrypted
under her own key, we can still pull the encryptions into sums that can be decrypted. Initially--well,
I'll tell you what the--we have a variety of protocols that do certain things with different
keys, but the important things are that in doing this, the third party can compute without
needing access to any of the keys, without decrypting any of the data and without further
interacting with the--of the individuals. So the third party does this all and, again,
doesn't learn anything; it doesn't learn anything about the pooled statistics either, but does
learn about--but does process the information so that they can hand something off to an
analyst who has some keys who can learn the resulting statistic. And finally, the--we
wanted the protocols to be fairly easy to implement and integrate with MyUnity system.
And so, one feature of this is that we were able to put together basically off the shelf
components that you'll find in, like, Java Library, .Net Library and a few things that
it was fairly easy for us to write on our own in order to implement this functionality.
So here are some of the ingredients. What we want to have is an additive homomorphic
encryption scheme. I'll explain what homomorphic encryption schemes are in just a little bit.
But the key property is that this is what enables a third party to compute the encrypted
sum. And we're going to use a symmetric key homomorphic encryption scheme. Most homomorphic
encryption schemes are public key encryption schemes, but the symmetric key gives particularly
compact encryption, which is good for the bandwidth, and also, it has this key feature
that it makes it really easy to compute sums over data encrypted under different keys.
So in the simplest protocol the keys for all the values contributing to the sum are needed
to decrypt so the analyst can decrypt. If the analyst has--is able to decrypt the sum,
the analyst can also decrypt the individual values. And that also provides some security
because the third party hasn't been able to learn anything in the process. So we'll first
present that protocol and then we'll show how we can do some variations on that to give
more security where the analyst can decrypt the statistic but cannot decrypt any in--the
individual values. So in order to achieve that next step to the--get to the point where
it's full need to know security where the analyst can only decrypt the group statistic,
not the individual values, what we do is extend Chaum's Dining Cryptographer problem-based
networks to this--to obtain a more complex key structure. And the family of protocols
will show is secure against collusion by K users. So, in order to decrypt your value,
K other users would have to get together and pool their resources in order to decrypt your
value. So, homomorphic encryption simply in--it enables you to do computation on the ciphertext,
on what's been encrypted and then decrypt afterwards. So, an additive homomorphic encryption
scheme, you can encrypt a message, encrypt another message. And then if you add those
two messages together, that's an encryption of the sum of those two messages. Most encryption
schemes are--do not have this property. So it's a very special type of encryption. There
are also multiplicative encryption schemes. Again, if you encrypt a message and you encrypt
another message and then you multiply the two ciphertexts together, that gives you a
ciphertext that you can decrypt to get the product of those things. What's interesting
is that it's hard to do both. So, algebraic or fully homomorphic encryption scheme allows
you to do arbitrary combinations of both addition and multiplication. And that's highly desired
because that lets you do general computation. However, prior to 2009, no such scheme was
known. Okay. All homomorphic encryption schemes were only partially homomorphic. In fact,
almost all of them could only compute sums or could only compute products, but not both.
And there were a few recent advances that allowed you to compute arbitrary sums and
a few products like second degree polynomials, like that. So, last year's big result was
Craig Gentry's fully homomorphic encryption scheme. So, that was a very exciting result.
And from a complexity theory point of view, it doesn't look, like, that bad because it's
just a constant slow down. However, for those of us interested in practical schemes, the
constant is a bit big. So if you want to do computation on the encrypted data, it's a
trillion times slower than if you wanted to do it on unencrypted data. It's a little tough
to use at the--at present. So there's--that was a big breakthrough. There have been some
follow on papers and maybe, eventually, we'll--we will get there. But for the moment, we have
to stick with schemes that are a little less powerful but will do what we--what we want.
So, most of the remaining schemes are based on public key cryptography which is quite
a bit less efficient than symmetric key encryption. So we were excited to see that Castellucia
had recently developed a symmetric homomorphic encryption scheme for using in wireless sensor
networks. And a probably nice feature of the scheme is it's--is the basic idea behind it
is very simple. So in preparing this talk, I sort of debated how much should I go into
the technical details here and then I thought, "Well, this is one of the beautiful areas
where the technical--the technical details are actually relatively easy to communicate."
So, I want to give it a try. So, the idea between the symmetric homomorphic encryption
scheme is Alice has a message M1 and she--suppose she generates a random value R1, okay? Then,
she encrypts by adding those two values together. And because R1 is random, the ciphertext is
randomly distributed. So someone can decrypt if and only if they know this piece of randomness,
this R1. Similarly, Bob has a message or a piece of randomness and encrypts in an analogous
way, then they can add these two things together. And basically, what you have is the message--the
two messages added together plus the two pieces of randomness added together. So, anyone who
knows those two pieces of randomness can decrypt, otherwise, it appears random. So, the biggest
problem with that simple view of the scheme--oh yes, question?
>> This is on [INDISTINCT] of power of two. Are there arithmetic system [INDISTINCT] on
power of two or overall integers? >> RIEFFEL: It's a mod--a large factor. Yes.
>> A large factor, but is power of two a prime factor or some specific factor?
>> RIEFFEL: The--you can--you can choose it. Just large, yes. So you want to choose it
large enough so that when you're--when you're adding these things up you don't overflow.
But other than that, you can choose whatever you like. So if a power of two is convenient,
choose a power of two. If it's not convenient, choose something else. Yes. Other questions?
So, the problem with that really simple view is that if Alice stores encrypted values this
way and she then wants to decrypt, she has to remember what all of those pieces of randomness
are. So, she probably has to store them because they're random things, right? Have you remember
a whole sequenced random things. Well, storing them means she's storing a whole bunch of
stuff and, also, that's not very secure. She's storing all these paths that will be used
to decrypt. So what we want is a predictable and efficiently computable source of something
that's like randomness. So pseudorandomness means that it's indistinguishable from random
unless you know a secret such as a key. In this clip, what indistinguishable means can
be formalized. So what pseudorandom function families do is they provide the capability
Alice needs here. So, she--if she has a key, that's used as an index for this function
family. So she gets a function F sub KA, which is her pseudorandom function. And she can
then encrypt by she takes her message and she adds to it the--her pseudo--her pseudorandom
function evaluated at a nonce, which is just a non-repeating value or rarely repeating
value. So, for example, a good thing to take here would be the timestamp and the presence
type that she's encrypting. So at this particular time, so when she wants to decrypt, she knows
what time and what type of presence state she can then compute this pad here and subtract
it all from the ciphertext and get her message back, her value back. So as long as her pseudorandom
function is truly indistinguishable from random, her message is safe from anyone who doesn't
know the key. So we're getting to where we can talk about these protocols. So, the one on this slide
is one which enables partial need to know security but where the analyst to decrypt
needs to know all the keys used to encrypt. So, as usual, you need to choose security
parameter, a large modulus. We use HMAC as a pseudorandom function family and we have
length matching hash function H. And for key generation, each team member generates her
own key. And then to encrypt, that team member uses her pseudorandom function that's indexed
by that key to encrypt as we saw it just by adding what we're calling the pad, this piece
of pseudorandomness. And to compute a sum, you just simply add the ciphertext. And to
decrypt, anyone who has access to all of the keys can decrypt by computing these pads for
the nonces, the timestamp and the presence type of the values contributing to the sum
and subtracting them off from the encrypted sum. So the issue here is that the analyst--if
the analyst can decrypt the sum, the analyst can decrypt all the individual values. Another
issue is that decryption requires the computation of a lot of pads. So more complex key structure
that we'll present next completely solves the first problem and reduces the second.
So it's nice that by making the protocol more secure, we also make it faster, but that doesn't
always happen. And, again, the idea behind this is not too hard. So suppose each user
had a set of its own random values and then all of those values plus R zero, which we're
going to give to the analyst, add up to zero. Okay. Then if the users encrypt using these
pieces of randomness, then--so by adding these pieces of randomness to their values. Then
when you sum all those values, you're going to get the sum of their values plus the sum
of all these pieces of randomness. Now, if the analyst adds R zero to that because of
this property here, all of the bit pieces of randomness disappear and you're just left
with the sum of the values. And so in this way, the analyst can--just by knowing R zero,
can decrypt the sum, but R zero does not enable the analyst to decrypt any of the individual
values. So that's the idea. Then trying to figure out how to actually implement something
like this was a little trickier. So the best thing I was able to come up with early on
was that the users come up with keys and share them in a certain way that gives a similar
property. So the analyst generates a key K zero. Each user generates her own key, and
then the analyst and the user share keys as follows. So, a user shares her key with her
neighbor with the index one higher, okay? So, the analyst shares her key with the first
user. Similarly, the last user will share her key with the analyst. There's sort of
a ring here, okay? And then to encrypt, again, we're just adding pads but it's a little bit
more complicated. As before, we add the pad corresponding--a user adds the pad corresponding
to her own key and then she subtracts off the pad corresponding to the key which she
received from her neighbor. And the point of doing this is that when you sum all those
ciphertexts, almost all of the pads cancel out. And the two that don't cancel out are
the first and last ones, which are the ones that the analyst has. So, in the end, you
get the sum of all the ciphertexts is the sum of all the messages plus the first pad
minus the last pad. And since the analyst has these two keys by how we did this sharing
in the first place, the analyst can compute those pads and can obtain the sum of all the
statistics. However, the analyst cannot decrypt any of the values making up that sum because
she does not have any of the keys for the individual values. So, that's our--that's
our first fully need to know protocol. And then, I'm not going to go through this in
detail, but the same idea can be generalized to get a little bit more security. So, if
you look at--if you look at this protocol, when a user encrypts her own value, she's
using a key that she has shared with one neighbor and a key that she has received from the other
neighbor. So if the two neighbors on either side collude, they can decrypt her value.
So we want to make it a little bit more secure. And so, we were also able to come up with
a whole family of protocols where you needed to have K users collude instead of just two.
And basically, if you have a graph structure in which the analyst and all the team members
have K plus one neighbors, you can do a similar key sharing scheme and a similar encryption
where you add some of the pads, subtract off others of the pads and then when you add them
all together, the pads cancel except for the ones that the analyst has the--has the keys
for. And so in this way, we get a family of protocols that are secure against K collusion
users. Okay. One comment is that this key structure that we came up with is a little
bit more complicated than this original idea of finding random numbers--finding a source
of random numbers that add up to zero. And I didn't see how to do that then I asked that
as an open question. And, Elaine, she, at Park, thought about it for a while and came
up with a way to get a series of random numbers that have this property at each time step.
And so we have a joint paper coming out with that view. It has some advantages and some
disadvantages. It's secure against any sorts of collusion up to--I mean, nothing can be
secure against everybody colluding because of the sums available. Then if everybody says
what their value is, your value is not secure. But beyond that, it's very elegant in this
respect and doesn't have the K collusion issue. On the other hand, because the decryption's
much harder, it's restricted to a small plain text space. So, anyway, there's--I believe
there's a lot more work that can be done both on need to know protocols that do other things
than just sums, for example, other ways of obtaining these things. I did want to mention
that in related work as I had mentioned before, most approaches to this problem use multiparty
computations where the clients are involved in each stage. Another piece of related work
that you'll hear in this space is differential privacy. And it addresses an orthogonal issue.
It addresses the question of if you obtain a statistic, what can you learn just from
that statistic about the individual values. And sometimes the answer is quite a bit. And
so one really interesting question is can you combine our need to know protocols with
differential privacy? And in a standard differential privacy setup, there's an assumption of a
trusted third party who calculates the statistics and then adds noise in a way that makes it
hard to figure out what's going on with the individual values. But they saw everything.
And what--and what we're able to do is combine differential privacy with the need to know
protocols to obtain both the need to know property and differential privacy without
the need for a trusted third party. There was a previous paper by Rastogi and Nath that
use secure multiparty computation techniques to achieve differentially private aggregation
without a third party. But, again, that requires clients participating at each point. Finally,
there's a certain amount of work in the wireless sensor network setting that builds on Castelluccia's
work as well. But other than that, we believe we're the first application of their symmetric
homomorphic encryption scheme outside of wireless sensor networks. So, to wrap up, this is a
really exciting area of how do you support privacy and utility with all this data collection.
And we hear very frequently the cynical view that privacy is dead and that there's nothing
we can do about it except to mourn its death. And that may be fun to write about, but we
think it's actually more fun, and certainly much more challenging, to ask "Well, what
can we do here that's useful?" And one area to look at is just having a better understanding
of what user preferences are with respect to what do they want to learn from the data
and what are their concerns about the data. And then there's all sorts of interesting
questions about finding mechanisms to encourage benefits and reduce harm. And these can be
cryptographic, like the one I presented today, or they can be economic or social. Admittedly,
the challenges are big. That's probably why the "privacy is dead" slogan gets passed around.
On the other hand, that means that there's lots of room for many people to work towards
interesting solutions. To wrap up, we'd like to thank particularly Bill van Melle and Adam
Lee who worked with us on the secured history part of this work and the entire MyUnity Team;
Bill, Thea Turner, Pernilla Qvarfordt, and Tony Dunnigan and others. And we'd particularly
like to thank all the MyUnity users and study participants for their help with all this
work. And if you're interested in learning more, please talk with us. And here are also
some references to look up, some of the work we've discussed. So, thank you again for having
us here. Really enjoyed it. >> BIEHL: Does anyone have more questions?
>> I'm going to talk a little bit more about the HMAC, how you use it because you kind
of need for every series or whatever it is you're going to have permits on scope of "C"
to generate it to the round of numbers that will end up to zero. So I guess that it should
be opposite from your sequence number or a timestamp or how do we use that?
>> RIEFFEL: So let me see if I understood your question well enough. But we use it,
HMAC, as a keyed pseudorandom function... >> Yes.
>> RIEFFEL: ...family. So, I make up my own key and that gives me a pseudorandom function
family, basically, a version of HMAC. >> Yes.
>> RIEFFEL: And then, in our particular case, we're interested in encrypting each of these
five bits of a present state at a given time. So what the nonces that the--that version
of HMAC is evaluated on is the timestamp together with the present state.
>> Okay. >> RIEFFEL: Okay?
>> So then timestamp is public knowledge, right?
>> RIEFFEL: Yes. >> So then you could see that it's between
on the ciphertext, right? >> RIEFFEL: Yes. Yes. So that's all public
knowledge. The only secrets are the keys. >> Yes.
>> RIEFFEL: So those have to be guarded very carefully. But--yes?
>> Also, if these are--to think about it like on the fly?
>> RIEFFEL: Absolutely. >> Okay. This is my understanding. If you
abbreviate instead of over all your dataset, just over two people, one of which you already
know the key because of your key sharing mechanism, right?
>> RIEFFEL: Right. >> And then if you add or remove the key zero
or what have you, which the analyst trusts, right, whichever end up to zero, this should
be about the same as if you abbreviated over all users where all user's data was always
zero, right? >> RIEFFEL: Okay. I didn't quite follow that
last part. Sorry. >> Okay. So, how the--K zero is the...
>> RIEFFEL: Okay. >> ...key the analyst trusts, right?
>> RIEFFEL: Yes. >> Okay. It has--it just basically might know
of the sum of all the keys that other users have, right?
>> RIEFFEL: No. >> To some extent.
>> RIEFFEL: It's the index into a function that...
>> But the--under instant generate that number, right?
>> RIEFFEL: Right. Yes. >> Yes. That's the important part. And this
number is used to--okay, it's composed of the sum of all the keys used by the other
participants to some extent, right? Because they haven't--these algebraic relationships
didn't add up to zero. >> RIEFFEL: It has that algebraic relationship.
You don't use that algebraic relationship to get it. But, yes.
>> Yes. Yes. >> RIEFFEL: Yes. Yes. Yes.
>> But it has, up to a certain extent? >> RIEFFEL: Yes. Yes.
>> Oh, perfect. So any ciphertext would be one part of this algebraic expression and
plus the number--the secret [INDISTINCT], right?
>> RIEFFEL: Right. >> So, basically, is zero is a pre-run in
to this algebraic expression as simply values or as the abbreviated ciphertext if all linked
extra square of zeros, right? >> RIEFFEL: So not the key but the function
value in--the function--the pseudorandom function... >> Yes.
>> RIEFFEL: ...evaluated at that time... >> Yes.
>> RIEFFEL: ...for the--for that key. >> Yes, exactly.
>> RIEFFEL: Yes. Yes. >> So key for the encryption and you'll derive
that key from HMAC... >> RIEFFEL: Right.
>> ...function. >> Okay. I get it. So what...
>> RIEFFEL: We all those pads, by the way. >> Pads. Okay. Yes.
>> RIEFFEL: If that might help. Yes. >> Okay. It does. So, basically, Analyst E
at any given timestamp is exactly the same as aggregating all...
>> RIEFFEL: The other ones. >> ...zeros length [INDISTINCT] this with
each user's pad, right? >> RIEFFEL: Correct.
>> And it has already one of the--of the end users secret piece, right?
>> REIFFEL: Right. >> So if you can derive one of the end user's
[INDISTINCT]? >> REIFFEL: One of the pads, but not one of
the values because two pads were used to encrypt each.
>> Yes. >> REIFFEL: Yes. Okay.
>> Okay. So what happens if the analyst abbreviates just the site or text from one of the user
for which he knows the key, right, and some other user?
>> REIFFEL: Yes. >> And then uses keys pad, which is K 0 or
something like that, to decide--to just add, say, four [INDISTINCT] or something like that.
Then that operation would be as keys of the pad he already knows minus the secret number
of the user keys abbreviated with... >> REIFFEL: Yes.
>> ...minus all the other users with their plain text set to zero, right?
>> REIFFEL: Okay. >> Positive index set to zero. Wouldn't that
provide information about the user who are abbreviating with?
>> REIFFEL: So, no, because there's still--there's still a random number--the pseudorandom number.
So, each user is encrypting in this way, right? >> Yes.
>> REIFFEL: So, say, it's the first user and the first user is encrypting with, as you
say, K0 and K1. >> Yes.
>> REIFFEL: So as you say, the analyst may know K0 and can compute this path, right?
>> Yes. >> REIFFEL: But the analyst can't get any
information about this pad. >> But this pad--we have to think about all
this since we have time constraints. >> Yes. We might have to go to a blackboard,
but. >> Okay. I think probably someone have some
more questions. >> REIFFEL: Okay. Anyone else? Okay. Well,
thanks.