Tip:
Highlight text to annotate it
X
>> TOM RITTER: Hello and welcome. My name is Tom Ritter and I work for ISAC partners.
If you don't know who zax and dism is, you will know by the end of the talk. This is
an anonymity group. This book, many of you will call the Bible, had not even come out
yet. But the first edition. And while you could export the book itself, the U.S. government
had determined you could not export the floppy disk that the code had come on. In fact, the
U.S. was actively investigating Phil Zimmerman for violating the Arms Export Control Act
for making the first few versions of PGP available. A group went on the offensive taking the U.S.
government to court and suing over the export controls on crypto. Another group of people
ultimately printed out the source code for PGP, exported the book to Europe and scanned
it in and OCRed releasing a version that bypassed the export controls. Alt.Anonymous.Messages
was forged in the heyday of cyber punks and has changed very little in the intervening
decade since it was last shaped in any major way.
But in that decade, what we have seen is a monumental focus of the nation's spy agencies
on not what was thought to be the most critical piece of information to encrypt, the content
itself, but rather on metadata. The people who know don't ‑‑ won't talk and the
people who talk don't know. But leaked court orders require Verizon to turn over call records
local and abroad. Now, I'm talking here so I don't know anything and I'm just speculating.
But the most straightforward thing to do with this data is to build communication graphs,
analyze the metadata, looking for patterns, identifying people of interest and figure
out who they talk to. And the metadata around an encrypted channel tells volumes.
So SSL is the most widely used encrypted channel on the Internet today. And even ignoring the
numerous attacks we've seen in the past few years and ignoring how it break almost every
cryptographic, there is numerous things you can learn. There are protocol‑level leaks
itself. It says about the type of client you're using and the version. And it includes what
you think the local time is. So here's hoping your clocks are synced.
But from an information theoretic perspective, an adversary can see that you are sending
packets and communicating. That seems obvious, you know. Of course, they know you are communicating.
It is important to bear in mind for the future. Ideally, the adversary wouldn't even know
that you are communicating. Secondly, SSL makes no attempt at hiding who
you are talking to. So the fact you are Facebook, straightforward. Similarly, the adversary
knows when you are on Facebook and when you are sending data and when you are receiving
data and the resolution on this goes down to the microsecond. So they know exactly when,
but they also know exactly how much data you receive. SSL doesn't have any real padding.
And I don't know of any Web site that adds variable length padding to frustrate length
analysis. So how many of you stayed through Runa's talk?
A few. Thank you. Let's talk about Tor. Tor is an implementation of onion routing where
each node peels off a layer of encryption until an exit node talks to the destination,
the destination responds, and it is routed back.
Onion routing specifically aims to disguise who is talking. An adversary observing can't
see that you are talking to a Web site or a service. An adversary observing that Web
site or service can't see who is talking to it. But it doesn't stop an adversary from
knowing you're talking to someone, knowing when you're talking, and how much you're saying.
Tor doesn't really do padding. What little it does is not intended to be a security feature.
Tor explicitly leaves out length padding. And if you stayed through Runa's talk you
know Tor cannot protect you if an adversary can see the entire path of a circuit. Let's
say hypothetically speaking that New Zealand, Australia, the U.S., Canada and the U.K. were
to, say, conspire on some sort of spy program. (laughter).
Well, if your circuit went through these countries, Tor can't help you, at least not information
theoretically. The adversary can track the traffic and find out who you are talking to.
I'm not saying this is actively happening. I'm saying we've proved in papers that it's
possible and that it's explicitly outside of Tor's threat model.
A slightly more difficult version of that attack is if the adversary can see you and
then see the last leg of your path later on, like, say, you're in China visiting a Chinese
Web site, they can do a similar track and track you down. It requires a little bit more
math, a little bit more correlation, but again we've proved that it's possible and it is,
again, outside of Tor's threat model. This is particularly concerting seeing I, like
probably most of you, happen to live in the U.S. and so much of what we do is hosted in
Amazon ECT2 in Virginia. The adversary can tell who you are talking to, so we are back
at SSL. I think it is worthwhile to show a couple
of attacks on *** data. There was a traffic analysis tool that looks at your SSL session
with Google and figures out what part of the Google Maps you are actually looking at, all
based off the sizes of the tiles that you're downloading over SSL. It is worthwhile to
note that this is an attack on a client, on someone browsing Google Maps at that moment.
Let me show an ultimate example. You are sitting on Facebook with Facebook
channel enabled. All over SSL, hacked. All over Tor. Facebook chat turns you into a server.
You are able to receive messages from people and they will be pushed down to you. The attacker,
not you, determines when you will receive a message. That's a pretty powerful capability,
and it can lead to time‑based correlation attacks. An adversary sends you a message
and looks at all the people connected to Facebook or Tor and see who's receiving a message right
after that. Since Facebook chats tend to be huge, it can lead to size‑based attacks.
I send you a huge Facebook chat with only a couple of trials, you can be pretty confident
that the user whose Internet connection you are monitoring is the same anonymize Syrian
dissent you are monitoring on Facebook. A similar attack was used to deanonymize Jeremy
Hammond who is waiting trial for dumping mail spools. The police staked out his home, watched
him enter, saw some Tor traffic and the username they thought was him popped on to IRC. Classic
confirmation attack. I have gotten some comments they cut his Internet connection and saw him
drop off. I haven't been able to confirm that in the police logs. Haven't had time. If that's
true, that's another type of traffic confirmation attack that's on a low‑latency connection.
Now, the good news is that even if the adversary can see the start and end nodes or even the
entire path, there is a way to disguise who you are talking to. And that's mixed networks.
Mixed networks introduce a delay while they collect messages into a pool and then fire
them all out. Collecting messages prevent as adversary who is observing the mix from
knowing what message went where. It introduces uncertainty. I really like mixed networks
and I want to encourage research and adoption. I want to take a quick moment to demonstrate
live on stage. So right now I'm going to be a Tor node or
an onion routing node or a low latency anonymity network. I will receive a packet and then
send it right out. Now, I'm going to play a mixed node or remailer node and I'm going
to collect a packet, stick it in my bag, collect another packet, stick it in my bag and collect
another packet and stick it in my bag. I will shuffle them up and peel out the outer layer
of encryption and now I will send them out all at once.
So you, the global passive adversary who can observe my computer and see all the traffic
I send and receive, you saw that I received three messages and you saw that I sent out
three messages. But you don't know which message went where. That's the uncertainty.
So mixed networks demonstrated we gained a certain amount of protection against figuring
out who is communicating with who. Given enough time or a low enough traffic volume, an adversary
can perform the same types of attacks I just described against Tor, correlating messages,
but it takes a lot more observation. The easiest thing to learn that takes no time
or analysis is the fact that I'm communicating. We don't disguise the "if." We also don't
disguise the "when" and we also don't disguise how large it is.
So enter shared inbox by Alt.Anonymous.Messages. That's a bit of a wordful. I will abbreviate
it to AAM. Imagine an email account where everyone in the room has the username and
password but it is read‑only access. You can't delete messages. You can't send them.
All the messages are encrypted so what you do is you download them all as one of the
people with access to this inbox. And then you try and decrypt each one of them with
your private key. And the ones that you can decrypt are to you
and the ones that you can't decrypt aren't. And you don't know who they're to. Well, someone
watching this encrypted connection, watching you accessing this mailbox and downloading
all the messages, they can see that you are accessing the mailbox. That's certain. And
they know you downloaded all the messages. But they don't know if you're able to decrypt
any of them. And because of that, they don't know when
you received a message, who it was from, or how large it was. All they know is that you're
checking the mailbox, not that you're actually getting mail.
At the cost of a lot of bandwidth receiving messages via shared inbox provides an awful
lot of security comparatively. Now, shared mailboxes are an awesome anonymity
tool, but the difference between an awesome anonymity tool an anonymity tool that's actually
used is the answer to the question: Can I interact with the rest of the world?
Tor is wildly successful compared to any other anonymity system because you can browse the
actual Internet with it. It's not a closed system where you only interact with hidden
services. So for a shared mailbox to actually be used, it needs to interact with normal
email and that's where nymservs come in. The newest and easiest to use receives a message
at a domain name and then just posts it immediately to Alt.Anonymous.Messages. This is a nymserv
written by zax and it is on GitHub. And the much more complicated type one or
GHIO nymservs can forward the mail to another email address or directly to Alt.Anonymous.Messages
or they can even route it through a remailer network to eventually wind up in one of those
two places. I will talk more about this nymserv later on.
So if we add nymservs to shared mail, shared boxes also have anonymity for the recipient.
When you send the message to a nym that uses a shared mailbox, you are ideally using an
onion router or a mixed networks, although you don't have to, thus you would have those
security properties. An adversary can see that you are sending, when you send it and
how large it is. Now I have walked through the security properties
through the different types of anonymity networks, let's actually dive into AAM. It should really
have strong security, after all it is the most theoretically secure.
But if you have never looked at it before, this is what it looks like, at least in Google
Groups. It is Usenet. How many people are old enough to have used Usenet? Good, good.
There is a whole bunch ‑‑ this is what it looks like today. A whole bunch of hexadecimal
subjects all hosted by anonymous or nobody. A message used, like, a PGP message that may
or may not have a version string. Today there are about 190 messages posted
per day. But what's interesting is that while the average has certainly decreased over the
last decade it has held somewhat steadily in the last five years. So the dataset that
I worked off was about 1.1 million messages from the last ten years.
Now, we can really see some shortcomings here already. Over half of the messages in my dataset
go through two people. The network diversity is horrible. If you stay through Runa's, you
know that's important. If either one of these folks, got subpoenaed or shut down or just
retired, the whole network would be thrown into disarray.
And to the person who asked about directory authorities in Tor, dism is one of the directory
authorities in Tor and he is not affiliated with the Tor project. He is just someone they
trust. Now, this looks pretty bad. It is way worse.
That 53 1/2% statistic was over the entire dataset. Today zax and dism make up virtually
all of the messages posted to AAM. I don't mean that they are sending them all, I mean
they are the exit node for all the messages posted to AAM.
And that weird dip, that was 7800 messages sent through Frow which operates a remailer
a news gateway. It had a unique subject. It didn't have any unique headers. I couldn't
get a whole lot out of it aside correlating those 7800 messages uniquely.
So with network diversity pretty clearly abolished, let's take a look at the data and see what
type of analysis we can actually do. I don't think I could say anything as ironic as this
quote. That's from 1994. So here we are just shy of 20 years later.
And the first thing to do is break it up by PGP versus not PGP. It is overwhelming PGP
messages but what are not the PGP messages quickly? I was trying to come up with a nice
way to say crack pots. I'm not sure if I succeeded. There are several people who have and continue
to post just random rants about ‑‑ I'm not even really sure. Some of them is definitely
the lizard people. And there are actually frequently asked questions that are sprung
up in response to these guys because people are just getting flat out confused by them.
And besides those, there are some other none PGP messages. I think the most interesting
are 10,000 messages with the subject operation satanic. What's interesting is they are clearly
cipher text but it is alphabetic. If you look at a single message, you might think it is
a Caesar Cipher or Vigenère. If you look at them in whole, you see it is a perfectly
even distribution over a 16‑letter alphabet. In other words, I think it is a substitution
cipher into hexadecimal and it is actually cipher text. There are other clumps that are
similar to this. If you are into this type of analysis, have at it.
And the next thing to look at is what percent of messages were delivered to AAM via nymserv
or via a remailer. These numbers will be a little bit off since some of the PGP or remailer
messages are to nyms and some are through remailers I don't know about. But it is something.
We can see that a large portion of our messages are to nyms which is important when I can
tell you how many nymservs are still running. Somewhat interesting statistics aside, let's
start diving into all of those hundreds of thousands of encrypted messages. Open PGP
consists of packet and each packet type does something slightly. There is a packet type
for a message encrypted to a public key a packet type encrypted to a password. What
are these packet types? These graphs show the popularity of each of the different packet
types. For example, packet type 1 followed by packet type 9. And the top five, the ones
on the bottom, are the ones you would expect to see.
Packet type 1 is messages encrypted to a public key. Packet type 3 ask messages encrypted
to a pass phrase. The actual cipher text of a message is 9 or 18 for old style or new
style. And I separated out the messages to a single public key versus messages to multiple
public keys. Now, there are two that are just kind of weird.
These are the packet types you expect to see after you decrypted a message. These are plain
text packets. There are actually a small number of messages that look like open PGP data.
They have got the whole begin PGP message ticker and they are base 64'd. But they are
actually just plain text sitting in plain sight.
If we look at packet type 8, this is what we get. It really is just compressed plain
text data. Unfortunately it is also nonsense. I don't know if there is a code there or not.
I didn't spend a whole lot of time on it after I looked at Iran organizing bizarre Sabbatical.
It probably came out of some mark hub generator somewhere. So I kind of moved on.
What I moved on to were messages sent to public keys. Now, it is super obvious to do analysis
based on the public key that's in the message. I promise you, it gets a little bit more complicated
later. But let's look at the key I.D.s.. So obviously they are a pretty powerful segmenting
tool. I want to illustrate examples where public keys can tell us more. There is one
key I.D ‑‑ I have anonymized most of the specific data in this because de‑anonymizing
people isn't cool. There is one key I.D. that messaged very reliably through a nymserv except
for two messages sent through Easy News. If you track down a very unique gateway and user
agent, that person sent another message to a key I.D. and we can make inferences across
multiple types of metadata. I separated out the information send to a
multiple keys. If a message was sent to a single key, we don't know too much about it
because they throw the key I.D. so it is all zeros. If a message is sent to more than one
key, then we can draw communication graphs. Now, it's not a strict communication graph
in the sense it was sent to Alice and Bob. It is that they received the same message.
In most situations, people will encrypt a message to themselves so they can read their
own sent mail. I started drawing these pictures about the
same time as the PRISM scandal started breaking. I was feeling really uncomfortable that this
is probably what the NSA is doing to me and my friends. But nonetheless, quick reference,
green means that I was able to get the public key off a key server. A circle means that
a key received messages to it individually as well as to, like, it and multiple other
people. And then the size of the circle and the width of the line is how many messages
they received. So there is this very nice symmetrical five‑person graph. We've got
these much larger communication networks here. A real big one here. A couple interesting
graphs with central communication graphs. You can infer from that what you want.
And then we got a couple more interesting networks. I think these are interesting because
they imply that not everybody knows everybody else. This graph and the next one may really
be a model of actual Internet where people will email people in a complex interconnected
but not fully connected way. This is a fairly low‑volume network and this one has quite
a few higher‑volume folks participating. And then there's like the rest, the simple
two‑person communications going on. So I was working on the ‑‑ but let's
talk about brute forcing cyber text. You saw packet type 9 was by far the most common packet
type found. There is over 700,000 of them. Now, this packet type is really interesting
so let's dive in a little bit into the open PGP spec.
This packet is the actual cipher text of the message. It is only the encrypted data. It
doesn't say what algorithm it is and it doesn't explain how to get the key. So where is the
key? The key is in another packet. It is in packet type 1 for public keys or packet type
3 for pass phrases. But if you recall from that graph, there aren't
any packets that precede packet type 9. We've got a disconnect from what the spec says and
the data that we actually see until we find this. The ideal algorithm is used with the
session key calculated as the MD5 hash of the password. Yeah, the MD5 and the password.
This is absolutely legacy and we have had better ways of doing this in open PGP since
the late '90s. So while in the very beginning of AAM this might have been excusable, the
fact that my dataset was from 2003 onward makes this a pretty horrible situation.
And we know how to do MD5s really, really fast. But that's only half of it. We also
have to do an I.D. decryption and then we have to detect of what we decrypted was the
actual plain text or just random. While you can run randomness tests, they are slow and
we are brute forcing so we want to go as fast as possible. This is all my way of trying
to justify that I spend a lot of time running GPU code and running it for months and killing
my home desktop. But I did get results out of all this GPU cracking. In fact, one of
the first few dozen of the messages we got was this one which did not ‑‑ (laughter).
(applause). >> TOM RITTER: Which did not make me feel
terribly good about myself. (laughter).
But I kept going. And I got some HTML pages. I got some we are SMTP logs. I got a lot of
partial remailer messages. But overwhelming what I got after I decrypted the message was
an encrypted message. Recursively recrypted PGP messages. And, in fact, here's a breakdown
of how many recursions I hit. I got about 10,000 decryptions into a public key message
and another 2200 that went into another password‑protected message. So I want to uncrack those and I
got about 49 messages that were two layers deep and then I had to crack some more of
those and I went four layers deep and then there is this one bloody message that was
four layers deep that I still couldn't crack. So it's pretty damn recursive.
For the number of messages I was trying to brute force, the fact I only got about 10,000
cracked is not really great. Password crackers would consider that a fail in Europe. I'm
not the best cracker. I'm sure people can do better. What I want to defend myself with
is I'm not trying to crack password, but crack pass phrases by the most paranoid people on
the Internet. I think I did decent. I haven't explained why there are so many
recursively encrypted messages. What the hell? To explain that I have to talk about remailers.
How many have used a remailer? About two dozen. So the tools that you have probably used,
mixed master and mixed minion are dubbed type 2 and type 3 remailers. That means there must
be a type 1 remailer somewhere, right? They're basically dead but the protocol itself lives
on in Mixed Master. And boy what a protocol. This is a manual of how to use most but not
even all of the options supported by type 1 remailers.
Now, some of the directives are on the left. Now, what's the difference between remailer
2, remix 2, anon 2, encrypt 2. I don't remember and I studied this stuff for a while. To use
type 1, you actually have to type all of these out yourself. It is not like a GUI where you
click a check box. I had talked in the beginning about type 1 nymservs. Type 1 nymservs are
the main recipients of these directives. You string together a my encrypted to different
nodes. You type that all out yourself, by the way. And that would be your reply block.
And when someone emails your nym, it would execute the reply block. Ultimately coming
out to your real email address or to Alt.Anonymous.Messages. And we're still seeing these messages posted.
But there are only two type 1 nymservs operating. One is zax. The other is Paranoisy (phonetic).
It is run by Italian hackers in Milan. They run two that you can think of an Italian version
of Rise Up. If you have ever heard of Rise Up.
So in conclusion, what are those nested PGP messages? They're type 1 nymserver messages
where the key idea is the ultimate nym owner. There is another layer of encryption I haven't
cracked yet. When you download type 1 nymserver messages you know all of the passwords. You
peel them off one by one and finally you use your private key. And these are all the recipients
with more than five messages. It's pretty top heavy towards just a few nyms.
So communication graphs and brute forcing is really just the first quarter, I would
say, of the analysis I did on AAM. A majority of my time was spent doing correlation. So
even if I don't know who a message is to or what it says, it is valuable to know that
it is to the same person as another message or that it was sent by the same sender.
And why is that valuable? Well, let's go back to the slide. You can't tell if someone has
even received a message in a shared mailbox. But if I can correlate one message with another,
then I can start determining that some unknown person has received a message. And once I
know that two messages are related, well, then I can start paying attention to their
time stamp and to the length. And this goes even further. Because people tend to respond
to messages that they receive. And since I know if someone has sent a message, it might
just be that they are replying to a message that they just received.
So let's talk more about correlation and some more analysis of what's going on in AAM. First
off, it's obvious that you can correlate messages that use a single constant subject. But there
are a lot of messages like these. Like, nearly half of all the messages post to AAM have
a constant, like, English subject. They don't use that hexadecimal stuff. They do tend to
be the older messages and they have tapered off recently which makes sense. But, you know,
you can look at kind of these numbers, 22,000 messages in a cluster, 18,000 messages in
a cluster. But let's talk about those random hexadecimal
subjects. Now, there are two algorithms to generate these subjects. They're called encrypted
subjects or e subs and hash subs or h subs. The point of these is to quickly identify
which messages are for you and which messages you should ignore. For the folks who used
Usenet, you can download the whole headers and not the bodies. We can probably cut this
stuff out but it is still there so let's break it.
E subs have two secrets, a subject, a password. H subs have a single secret, a password. It
is considerably more difficult to browse force the e subs and I ran out of time so I just
focused on the h subs. H subs were created by zax. And as his services are used more
and more, they make up an increasing percentage of the subjects. Now, h subs have a random
piece of them you can think of as an initialization sector, as assault. While I can try to shoehorn
these into the 56 hackers, it would be painful. You have to truncate the output. I wrote my
own GPU cracker and I cracked about 3500 h subs. Better the percentage of messages I
brute forced but not a great percentage. Again, these are the passwords the most paranoid
people on the Internet. Danger Will Robinson was used by some, but
it was used by some but not all of the messages that were sent to a couple of particular key
I.D.s. I cracked all the h subs of another key I.D. with the passwords of testicular
and ***. And if you don't know what *** is, don't Urban Dictionary it.
If h subs and e subs are used to let a nym owner identify their own messages, can we
do something similar? Let's say we want to target the nym Bob. What we can do is send
a particularly large message to Bob full of nonsense. And then we wait for a large message
to pop out in AAM. Zax's nymserv is near instantaneous. Type 1 nymservs are not necessarily instantaneous.
A little bit more difficult but not too difficult. You can do it a couple of times. And this
works and it works pretty easily and effectively. What we get is a specific message that we
know is to a particular nym. At that point we can target that for h sub cracking.
So I'm not done. But unlike everything I presented before, what I'm going to talk about now is
probability‑based attacks. That is, I come up with a hypothesis that I can correlate
messages with a probability, better than random, if I look at property X, whatever X is. Well,
how many of you like the scientific method? I don't really have controls. So what I'm
doing is I'm coming up with a hypothesis and running it across the data SET. And then I'm
looking at the clusters of messages that pop out and then I'm going to see if I can figure
out something else that correlates them. And if I can see something else that correlates
them, I call it a success. That's how I kind of simulate controls.
So let's say I think if a message has a header value of X, I think that's a unique sender.
One sender is sending that value of X. I run that analysis and I get clusters of messages
encrypted to a single public key. Well, if there was no correlation at all, I would probably
get a distribution that looks more random. It would be encrypted to random public keys.
But with such a nicely segmented public key, I kind of think that this worked. That's how
I kind of simulate controls and find clusters of data when there is no other ‑‑ and
if there is ‑‑ sorry. Even if I could have found that cluster by just looking at
the public keys, the data implies that I could use that trick, that is that hypothesis, to
find a cluster of data when there is no other distinguishing characteristic. So let's how
I try to preserve some semblance of the scientific method.
My first example is message headers. That's a big one. Let's look at these. There are
a few headers that are in every message but some tails that are only in a few. These mostly
unique message headers are not necessarily the goldmine that you might think they are.
That's because headers can be added at the client, at the exit remailer, at the mail‑to‑news
gateway or by the Usenet peer. What we have to do is to really go after the distinguishing
headers, to subtract out the headers that were added by all the other parts. Path which
we can do by just clustering by the exit remailer and then seeing which headers on all of those
messages and kind of subtract those out. Here are some great examples of headers that
were specified by the client. User agent, obviously, X post type I.D., X no archive.
If you use Usenet you know X no archive is a client preference.
These particular strange headers all formed a distinct clump of messages with the unique
subject of: Weed will save the planet. And that's an easy example of how the idea of
unique message headers can kind of correlate messages.
Now, X no archive, this means don't save it in Usenet. It is a client request that most
Usenet servers will obey. It is also not the word that I have on the screen. This is a
misspelling of the header. And there is one person or at least I'm claiming one person
who has messed this up and completely distinguishes their messages from everyone else's. All 17,300
of them. So this is what you want, right? No. Capitalization
matters. And this is not the correct capitalization. What's interesting about this one is that
it shows up on several long‑running threads on AAM composing nearly 28,000 messages. And
initially, I thought each of these threads was relatively independent of each other.
But after finding this little bit of information, I'm starting to seriously doubt that.
This one isn't right either. (laughter).
There is 1500 messages posted with this header, including some test messages that were posted
with someone's real name. This is actually the correct version. And
there is about 135,000 messages that have it or a little more than 10% which makes it
distinguishing in and of itself. So, just out of curiosity, another hand showing.
Has anyone ever used a type 1 nymserv? I don't see any hands. Okay. So encrypt subject is
a directive for type 1 remailers that should be processed by the remailer. It should never
make its way into Usenet. This is a bug. This is a client. This is an
user messing up. But I can't really blame them because type 1 is so horribly difficult.
There are over 10,000 messages like this. And when you reuse the subject like these,
you make messages without the encrypt subject stand out. That's the one on the far right.
Or even worse, mess it up once and then figure out how to do it but keep using that same
subject and password. So this let me identify 52 e‑sub messages that were otherwise security
but they messed up once and sent it through in plain text. And then there is encrypt key.
Another header that should never make it into Usenet but does because type 1 remailers are
so hard to use. There are over 10,000 of these messages.
And let's look at another header, news groups. Just‑like mailing lists, you can post a
message to more than one news group. If you do, you're wildly in the minority and that
segments you. Like this news group, there are 34 messages posted with this news group
and thank you so much for ‑‑ to Comcast for making your users extremely distinguishable.
And what about this value? AAM with four commas at the end. I thought this was a correlation,
but after tracking it down, it was actually a bug caused by the remailer remailer.org.U.K.
for one week in January 2006. Just some random trivia I pulled out.
How about this one with duplicate in news groups. These were sent through a large variety
of remailers and have no obvious correlation besides this value and that they have English
subjects. So the English subjects was another example of the control that I used to confirm
that using a unique news group is a bad idea. And humans are creatures of habit and as flakey
as remailers have been, a lot of people find a configuration that works for them and then
they stick with it. Well, if I partition people by the remailer and the news gateway that
they use, that's what the colored squares are. What was previously an anonymous discussion
thread suddenly makes it very easy to pick out who is saying what and who is agreeing
with themselves. And it's even easier if I add in the header signature on the far right.
And then here's a really interesting pattern that I observed. There are a host of messages
who have subjects with a 1 or a 2 in them, like soggy, soggy 2. Well, I looked at these
and found they are being posted together, really close together. And then I realized
one of the options in type 1 remailers is to duplicate a message for redundancy. Send
the message down two different remailer chains just in case one becomes unavailable. And
while that gains you some measure of availability and redundancy, it is distinguishing. You
can target a nym with huge messages. If you see two huge messages appear, well, you know
that nym's reply block duplicates the messages. Then look for all the possible duplicate candidates
and you have a candidate list of messages to that nym. Even if you are unsuccessful
doing a e‑sub or h sub attack. A similar pattern is these. Look at each pair
of messages are that are in the slightly different backgrounds. The second message comes out
of dism five or six hours layer of pan array. It is distinguishing. The subject for all
these was again: Weed will save the planet. Also messages from Frow were mixed in with
no obvious correlation to other messages. So there were a number of hypotheses I tried
that did not turn up interesting data. But there are more queries that can be run across
this dataset but I need to start wrapping up. It all comes down to metadata.
What we saw in AAM is the obvious mistakes we've kind of expected. It suffers a bit because
we haven't taken into account the lessons that we've learned in the 10 to 15 years
since it was developed. That's a lifetime in anonymity technology. But I do think there's
some traffic analysis lessons that we haven't codified as best practice that we should.
So what does the future hold for AAM? The security of a well‑posted message is good
with a lot of caveats. If you use uncrackable pass phrases, only use servers that output
packets, post through remailers with no distinguishing characteristics and you are willing to be
in a very small anonymity set, go for it. I don't know how many people are using AAM
today but I don't think it is a lot. What that means is if the government asks for a
list of everyone who uses it, they could probably get a really short list of names to dig fairly
deeply into each of their lives. And AAM crucially relies on remailers and
news gateways. And these services are dying. Remember, that two people zax and dism post
more than 98% of the traffic to AAM and it is also text based. Very limited bandwidth.
And the nymservs them sells are pretty crappy architecturally speaking. We can sing given
hot proxies like VPNs and ultra serves, a lot of *** because their architecture is
not nearly as strong as Tor's. But nymservs are in the same category as trust this guy
not to roll over on you. I feel compelled to mention that the alternative
is to use Tor which you do trust to send email via thruway accounts on a service you do not
trust. While this is a practice that everyone in this room has probably used or at least
thought of, it's also a really *** architecture. Now, the good news is we have something better.
We have a very strongly architectured nymserv, pin gin gate uses product information retrieval
instead of a shared mailbox. It exposes less metadata, resists flooding or size‑based
correlation attacks. However, it's not built. It's been started but it's got a very long
way to go. And it also requires a remailer network to operate. And we don't really have
a remailer network. What we've got is Mix Master and Mix Minion. Mix Minion is a bit
better than Mix Master. It uses old crypto with no chance of upgrading. Both of these
services suffer from the fact we don't have a good solution to remailer spam or abuse.
We don't have good documentation about them. And they both have horrible network diversity.
Under 25 people running Mix Master. Under five, five people running Mix Minion.
So if we like pinch and gate, the path forward also involves fixing Mix Minion and Mix Minion
needs love. It is currently unmaintained but we have a to‑do list that includes the items
I have got here. Some of them are extremely complicated in moving to a new packet format
others are straightforward like improving the TLS settings. The others give you practice
writing crypto, writing a complete stand‑along pinger in any language or style that you want.
So if you are interested, there are a lot of cool opportunities here.
But what I keep coming back to is the fact that we have no anonymity network that is
high bandwidth, high latency. We have no anonymity network that would have let someone securely
share the collateral motor video without WikiLeaks being their proxy. You can't take a video
of corruption or police brutality and post it anonymously.
Now, I hear you arguing with me in your heads. Use Tor and upload it to YouTube. No. YouTube
will take it down. Use Tor and upload it to mega or some site that will fight fraudulent
take‑down notices. Okay, but now you are trusting ‑‑ you are relying on the good
graces of a third party. A third party that is known to host the video and can be sued.
WikiLeaks was the last organization that was willing to take on that legal fight and now
they are no longer in the business of hosting content for ordinary people.
And you can say hidden services and I will point to size‑based traffic analysis and
confirmation attacks that come with a low latency network, nevermind Wyman's paper that
pretty much killed hidden services. We can go on and on like this. I hope you will at
least concede the point that what you're coming up with are work‑around for a problem that
we lack a good solution to. So if I have been able to entertain you, I'm
glad. If I have been able to inspire you to work on anonymity systems, I'm overjoyed.
If you want a place to start, I will point you there. Thank you.
(applause)