Tip:
Highlight text to annotate it
X
>>
Thank
you everyone for coming. I know we're normally in a different [INDISTINCT] room but we're
kind of, so [INDISTINCT]. Today, we're pleased to welcome *** Kemmerer from UC Santa Barbara
to talk about his research into botnets. I think it's a [INDISTINCT] botnet?
>> KEMMERER: Yup. >> Right. And there's [INDISTINCT]. Anyways,
please give him a round of applause. >> KEMMERER: Okay. Thank you for having me.
I love your conference room. It's just really state-of-the-art. I'd expect nothing better
for Google. If you want to change the slide, so--I'm not at the slide changer, so I'm going
to, you know, regularly be saying, "Please change the slide." So, I though first, just
overall on the same page, I'd just go over some terminology real quick. I assume most
of you know all of this. But at least we're using the same names. Bot's an application
that performs some actions or set of actions on behalf of a remote controller. It gets
installed on a victim's machine. In the old days, we used to call these zombies. So, once
you have a compromised machine that someone is working--using remotely, that's what we
refer to as a bot. It's modular in a sense that you can plug in different kinds of malware
for doing different exploits and so forth. And then, today, we'll be talking about Torpig
malware. When you have a collection of these infected machines that are controlled by,
usually single individual, it's referred to as a botnet. The way of controlling it, it
is by means of a control channel which require--is required to send commands to the bot and also
to receive data back from the bot. Originally, most of these control channels were IRC channels.
Today, we are seeing more of them being HTTP and HTTPs, and some of more actually Peer-to-Peer.
In course into Peer-to-Peer then you don't have the central control that you do in the
others. The person in charge of this is usually referred to as a bot herder or the botmaster
or the controller. He owns or controls the channel, sends the commands to each of the
bots and collects the data from the botnet army. Usually the motivation is power or money.
The motivation used to be, put another notch in your belt. Today, it's very, very much
money driven. Yes, I--how about if I put a red dot on you when I want you to change it.
Okay, next please. Okay, so let's talk a little bit specifically about Torpig. Before I--before
I do that, I want to just say, Torpig is actually delivered by another malware called Mebroot.
But I'm only going to tell you about those two lines right there about Mebroot. So anyway,
Torpig is distributed via Mebroot. Mebroot's really a malware platform. It provides you,
kind of like with a middlewares. So, you can install different applications and so forth.
For Torpig, it injects itself into 29 different applications...
>> [INDISTINCT] >> KEMMERER: It's what? Okay, sorry. It injects
itself into 29 different applications. These are all sorts of applications, like web browsers
and Skype and so forth. It steals sensitive information. It's one that had been supposedly
called one of the most dangerous botnets around, it steals, of course, passwords, HTTP POST,
bank credentials, logins to all different kinds of accounts and so forth, and we'll
give you some data on that a little bit later. It uses HTTP injection for phishing, which
is actually web in a browser phishing. So, it's not detectable by any of the current
phishing detection software, and it uses something called domain flux to locate the C&C server
which is changing regularly and I'll talk more about that specifically about those last
two's, three and more specifically in a little bit, just to say something about Mebroot.
Mebroot is spread via drive-by downloads and it's basically a sophisticated rootkit that
overwrites the master boot record. And then, once you--after it overwrites the master boot
record, it waits a little while and then it reboots the system, and then it's doing everything
at boot time, and then, again, it's not detected. Okay, oh yeah, this one is going to be fun.
One more time--okay, so, basically this is a pretty sophisticated group of people. I
don't know that they're sophisticated, their software's rather sophisticated. The architecture,
the system is rather sophisticated. The one thing--two things on this slide that are not
things that the Torpig or Mebroot folks did is up here where we're talking about hacked
web servers. So, these are just the normal legitimate servers that are out there, of
course they're vulnerable and so, the Mebroot folks have hacked into them and put software
on them so that we can get the whole process started, which I'll tell you about in a little
while. I shouldn't say all of these are innocent because a large number of them are actually
the *** sites and the belief is that the *** sites are probably getting paid to have
this software put on, on there. There's the innocent victim here. Other than that, we've
got the drive-by download server, which initially puts the Mebroot software on your system to
try and see whether or not you're vulnerable. We've got the Mebroot C&C which controls the
Mebroot itself. We have the Torpig command and control which controls Torpig and then
a separate injection server. So, let's see how this works. We start of with the innocent
victim connecting to one of these legitimate sites, one more please. When it connects to
that site the iframe that comes back down has a dTag on it that causes the innocent
victim to go over to the drive-by download server, okay? So, it goes over to drive-by
download server. The drive-by download server goes in--downloads the Mebroot software. The
Mebroot software looks for a number of vulnerabilities in the system, if it--if any of those vulnerabilities
are there, then the innocent victim now becomes a bot. They just love those devils, huh? Okay,
well, I should say, this slide was actually--there's a figure we have in the paper that you guys
had a reference too. And thise slide was done by an FBI guy, Forison, and then--well I change
it some, because he had UCSB looking like angels and we, we didn't feel like we wanted
that knowledge. But I wanted to acknowledge the FBI for doing the animation here. Okay,
so now that the--that we have the bot here, the first thing it does is connects to the
meb--oh, yeah, go ahead. Is it much easier when I'm doing the finger things then I know.
Okay, so, it basically connects the--as soon as it's infected, it connects to Mebroot C&C.
Now, as I said Mebroot is a--can be used for other malware. In this case, it's being used
for delivering the Torpig malware and it basically downloads three Torpig modules on to the innocent
victim's machine. These modules connect to the Torpig command and control every 20 minutes.
And one of the other things that--so once it connects the Torpig command and control,
first thing Torpig command and control does is download the configuration file. This configuration
file--I'm about eight slides ahead, but it will all come back, you know, if you hear
it twice then you'll remember it, okay? The configuration file has about 300 financial
institutions on it. Such that when you visit those financial institutions, it's going to
do a phishing attack on you. The other thing that happens is that every 20 minutes when
the victim connects to Torpig, it gives it any new stolen data that it has. Okay, so
I was telling you about those financial institutions, whenever the victim goes to one of those financial
institutions, Torpig connects to the injection server--oh, okay, I was at two. I didn't know
if that was a two-way arrow or one-way arrow--connects to the Torpig injection server. The injection
server--through a process which I'll detail later--gives it a particular customized phishing
page for that particular financial institution. And that--it looks exactly like all their
other pages do. Like the one for PayPal looks like a piece of crap. And I shouldn't say
that since you're going to distribute this. But yes, it does. The other ones look kind
of nice like they were prepared by professional people. And I'll show you examples of that.
So, this is basically how things are working when Torpig is owned by the criminals. Every
20 minutes it's connecting here, giving it any new stolen data. If the center--command
and control has any new commands, it downloads those. Every two hours, it connects to Mebroot
to see whether Mebroot has anything to do, okay? So, what did we do? So, we're the UCSB
Gauchos. That's actually the--what do you call it? The mascot for UCSB, which I thought
was kind of nice for us, although it's a black cap not a gray hat. So, what we did is we
put a vulnerable machine out on the net, it got infected by Mebroot and Torpig. And of
course we were running it in VMware and sampling everything it was doing. Reverse engineered
the software, we broke the encryption that was going on. And again, I'll give you all
the details of this later. And basically, we took over, so that--what was happening
is the innocent victim was going to start delivering to us and getting commands from
us. And the--and more importantly, it was not sending the stuff to the criminals. And
so, basically for 10 days, this is a position we were in, we had 100--over 180,000 unique
machines connecting to us from over 1.2 million unique IPs which we'll talked about why is
that different in a little while too. And so, we have all the information. Remember
I said that there is a connection here every two hours to Mebroot command and control.
And we fully expected that they would download some new information that would kick us off.
This didn't happen for 10 days anyway. And I'll tell you why it did happen after 10 days.
Also, we thought it would maybe come out as some other ways, I think that's the end of
this--oh, so that goes away and it goes back to them. And, let's go to the next slide.
So, let me talk a little bit about how the phishing page works. So, the--when the configuration
file that is downloaded from Torpig and can get updated regularly if you want, well, it
turns out it doesn't get updated that often. But, it has domains of interest. In our case,
we're approximately 300 financial institutions. And, you know, you tell me what financial
institution you were with it, it was probably on there. And so, countries all over the world,
not--I don't know if we had any in China, but we had, you know, Australia, New Zealand,
the UK, Europe, and all over the US and so forth. Okay. So, what happens when one of
this domains if interest is visited, the Torpig software issues a request to the injection
server. The injection server specifies a particular trigger page on that domain, and the idea
is that--and it also gives a URL to the Torpig software. When the user of the--of that particular
bot gets to that particular trigger page, the Torpig sends the UR--invokes the URL.
What the URL does is downloads a page... >> [INDISTINCT]
>> KEMMERER: Yeah. >> [INDISTINCT]
>> KEMMERER: Yeah, it's a problem from--I'd like to step away, away from them like, but,
I don't know if that's okay with other people. >> [INDISTINCT]
>> KEMMERER: Okay, so--so basically when it gets to the trigger page, the Torpig invokes
the URL, gets a phishing page back from the injection server. And what it does through
a man in the browser attack, it goes in and displays that page for the user to look at.
Okay? Let's see, what I want to say, so that phishing page reproduces the look and style
of the target website. So, here's an example page for Wells Fargo, part of the page anyway.
Where you can see it--it looks just like Wells Fargo. If you bank at Wells Fargo that looks
it, like what your bank pages normally looks like. In fact, you know, one of the things
you're told to do is look up here to see whether or not if it says Wells Fargo and then all
at the end, it--at the end, it says dot-dot, you know, dothacker at a bad domain or something.
But it has a right domain there. It's doing SSL, it all looks legitimate and it's not
caught by any of the phishing detectives. Of course, it asks for things like first name,
last name, date of birth, social security number, mother's maiden name. If you go to
the next one, just so we don't--this is one for Bank of America and it has some more questions
there which I can't read from here and I'm sure you can't either. It says, oh, your father's
middle name and what city was your mother born, all those sorts of things. Once you
fill these out, you basically, you know their, their--your right for identity theft, okay?
And people would fill them out. The interesting thing was, they would fill them out, some
of them, we saw, they would fill them out and then they will send a message to the PayPal
security guy, and say, "Why are you asking for this stuff? I though I gave all this to
you before." But, they did it after they filled it out. So, it's, you know, kind of interesting.
Okay, next. Okay, so let's talk about domain flux. Domain flux is the way that you find
out what command and control server to connect to. So first, let me give you a little bit
of history, when--if you're looking to take down a botnet, first thing you want to do
is either get one of your bots, one of your machines to be infected like we did, or you
find infected machines out there. And if you find the infected machines you can take them
off. Okay, you can cleanse them and fix all the vulnerabilities. But if you take one machine
out or 10 or 15 or 200 out of 180,000, you don't have much effect. So what you really
want to do is you want to go for the command and control servers, okay? And so, if you
use--if the command and control server uses a static IP address, then you can block or
remove that host. So you can't go to that domain or if you're a law-enforcement, you
can go there and physically capture the machine. And they actually had done that in the past
for one of the Torpig machines where they actually captured one and got a lot of, like,
a lot of data on it from that and found out who was compromised. Okay? So, but, remember
this is an arms race. You know, it's the hackers against the good guys. And it's constantly--you've
start thwarting, you know, the common way that they're doing things and they raise the
bar. Okay? And so, the first thing they did to raise the bar was something called fast
flux. And with fast flux, you have the same domain name all the time, but where it goes
to changes, okay? And in this case, you just block the domain name, all right? What they
did in Torpig, and they actually do this in conficker also. And there were some earlier
bots that did this. The idea is that the bots periodically generate a new command and control
to go to. So, whether the software that's downloaded on the bot has a domain generation
algorithm downloaded on it. In the case of Torpig, it changes once a week. For these
domains, as I say, they change often, they use a local date system time as input. And
what you do as a botmaster, then just, needs to register this domain. When it changes,
all of the information starts coming to the botmaster as it rolls over, okay? What happens
is if you want to defend against this, you have to go and figure out all the possible
domains that they could be registering, register them before them, before the botmaster does
and then you can take it over. And you'll see that that's what we did, but it's not
as easy as it sounds. Next please. Okay, so let's talk explicitly about domain flux for
Torpig. As I said, each bot has a domain generation algorithm on it. We reverse engineered this
domain generation algorithm. It also has three fixed domains that it goes to if everything
else fails. Now, for the domain generation algorithm, it does two things. One is, it
generates a weekly domain name, which I'll refer to as WD. It also at a daily rate if
it needs to, it generates a daily domain name. And in every 20 minutes, the bot attempts
to connect in Torpig--in order to the weekly domain name.com, the weekly--and if that fails,
and what I mean by if it fails, if that domain's not available, it's not out there, or if it
connects to that domain, and that domain doesn't respond in an appropriate way, it doesn't
have the right protocol, then it will go to wd.net. If that fails, it will go to--roll
over to wd.biz. If all three of those fail, then what it does is it tries a daily domain
and it generates a daily domain. It tries dd.com, if that fails it goes to dd.net, if
that fails it goes to dd.biz, if all of those fail, then it tries three fixed domains that
are hard-coded in the software, okay? So in the past while we were watching Torpig, it
turns out that the criminals normally registered wd.com and sometimes wd.net. Okay? So, which--so,
they're interested in efficiency. First one, it's going to--they get. And so--if you go
on--so what we did is we wanted to get them to come to us instead of going to the criminals,
okay? And so, first thing we did is reverse engineered the name generation algorithm and
figured out what the command and control protocol was. And as is always the case, one of our
grad students noticed that the criminals hadn't registered wd.com or wd.net or any of those
for about three weeks out from what the current date was. So, we went and registered those
domain names, okay? So, we registered wd.com and wd.net, okay? The next, it was either
later that day or the next--and we registered them for the next, for three weeks out, from
January 29th to the--whatever three weeks out was, fifth--22nd or something like that,
February 22nd. Oh, here, I can read what's on my slide then I'll know it, okay? So, we
registered these domains ourselves, wd.com, wd.net, either the same day or the next day
the criminals registered wd.biz, okay? So, we figured what they were doing to do, was
going to do a denial service on our two names. So, it would rollover to wd.biz, you know,
and then download new software and we'd be out of businesses, okay? It turns out that
didn't happen for 10 days anyway. What did happen is on February 4th through Mebroot,
which it connects to every two hours; they push the new Torpig binary which had a new
domain generation algorithm. That domain generation algorithm was--also one that was dynamic,
so you couldn't calculate it out weeks ahead of time like we did. The new domain generation
algorithm was based on the highest hit Twitter for that particular day. And so, you--and
so you had a--you know, had to be kind of on top of it. And you could believe we were
talking to the Twitter folks to go and see if we can get acknowledge first, and we actually
did later on, but, that's another talk. Okay, so basically we control the botnet for 10
days, we got about just sort of 9 gigabits of Apache logs, we got 69 gigabits of pcap
data and I'll tell you more about those in a minute. So, let me just say a little bit
about how we did our Sinkholing, we purchased hosting from two different hosting providers
and there's hosting providers that are known to be non-responsive out there. I mean, you
might ask, "Hey, if you know the command and control is here, why don't you just call a
hosting provider and say, hey, take that guy down his a bad guy?" Well, these guys get
people to go and let them host them and so forth, because they're--because they are unresponsive.
And those are well-known. So we went to two different ones, because we wanted redundancy
here. We also registered wd.com and wd.net with to different registrars, and this was
actually very beneficial because the 7th day that we owned it, evidently on the 6th day,
some Spanish bank reported to the registrar that we were doing bad stuff. And they tried
to call us, but of course we've given them a bad phone number I mean, we were buried
through about four levels, because we didn't want to get our knee cap shot off or anything,
you know, when this came down. And so we found out too late, but they suspended us on January
31st, but we had the backup ones, so we didn't lose any data at all. So that redundancy really
worked out. We set up Apache web servers to receive the bot request, recorded all of the
network traffic and the other thing we did is we automatically downloaded and removed
the data from our hosting providers, we encrypted it with AES, downloaded it, and so that if
anybody took over the physical machines or actually broke into the machines, that data
won't be available for them. Next please. Oh, interesting thing is that, as I said,
we were set to take over on January 29th, while a week earlier, we enabled our software
on the host, our command and control stuff to collect this a week earlier and immediately
359 infected machines connected to us. So, these guys had their clock set at least a
week earlier and thought that they were already supposed to be reporting to that particular
domain. That was kind of nice because we knew at least it would--you know, it would probably
work. We didn't know how well it would just work--would work, however. Next please. Okay,
so before I talk more about the kinds of data we got and so forth. Let me just say, you
know, when you're collecting this kind of information or you had the potential to collect
this information, you've got to be very careful about what you do with it. And so we basically
had two principles that we set for ourselves in terms of the data collection. And the first
one said the sinkholed botnet should be operated so that any harm and/or damaged to victims
and targets of attacks would be minimized. So we didn't, you know, we got somebody's
compromised already. You know, and we don't want to make things worse for them, okay?
So what we did if you hit--so, what we did is, first, remember I said, when they connected
the command and control, you have to do the right protocol back to them. And what we did
was--we did what we call the okn message. The okn message is like a null message. It's
just basically says, "Yup, I'm the guy you want to be talking to and I don't have anything
new for you." We've had people since our paper was released on the web say, "Well, why didn't
you put a new blank configuration file?" Because remember, the configuration file contains
the names of the financial institutions that whether they were going to do the new phishing
attack for, okay? Well, we don't know what the side effects of that would be, okay? And
we wanted to, you know, we didn't do this stuff just because we wanted to be good people,
we also didn't want to go to jail. You know, if you start sending things down to their
machine telling it to do different--do things. I mean, we were just collecting. We were a
passive collector except for the okn message. But, we were really concerned of--suppose
the compromised machine is something like a bank. And when you send down a blank configuration
file, it has a side effect that, you know, turns off all of the life-support things or
something or, you know. Probably, it won't be that bad, but one never knows. And so,
we didn't do that. As I mentioned before we removed the data from the servers regularly,
we stored the data offline in encrypted form. In fact, one of the things I did at first
Sunday after we took it over, we send someone out to Costco to buy the biggest offline data
storage device that we could and so we could, you know, store this stuff in there and lock
it in a safe, so we would be good citizens, okay? Principle number two, says, the sinkholed
botnet should collect enough information to enable notification and remediation of affected
parties. And so, we've worked to with law enforcement in particular with the, with the
FBI and Department of Defense cybercrime units. I'm going to say more about that later. Trying
to get to do an initial contact with law enforcement isn't as easy as one might think it is. You
don't dial 911, okay? And so, I'll tell you more about that later. As a result of dealing
with the FBI in particular, we put in contact with a bunch of bank security officers. I
know the chief security officers at banks that I never knew existed before, you know,
or how could Scotland have that many different banks, you know. And so it was kind of interesting,
and we also worked with the ISPs which we also found they're, you know, trying to work
with an ISP where you're cold calling and have been saying, "Hey, you know, you really
should take this guy off or you should leave us on or whatever" isn't always the easiest
thing to do, okay? So, let's talk about the data. As I mentioned before, the bot connects
to the Torpig command and control every 20 minutes. It does this by an HTTP POST. It
sends a header. The header is encrypted. The header has in it a timestamp and that timestamp
is a time that initially got infected. And it's little things like this that the Torpig
folks did for us that made it really easy to find out, is this a newly infected machine?
Or has this been around? So you can get an idea of, you know. Because if you jump in
and take over on January 29th and it's sending stuff to you while all those machines have
been infected, one would think for quite a while before that. So you should get a big
spike in the beginning, which we did. But, you know, what you want to know is how many
are getting infected each day, you know, of the 10 days we owned it. And that timestamp
was one way that that worked. The IP address, this is interesting. You'll see that we had,
we had one--we had one machine that had 700--it was either just short of 700 or just over
700 different IP addresses over the 10-day period. And if you see how many 20-minute
connections there are in that, it was almost like you had a different IP every 20 minutes,
okay. And so, this was something that was interesting which I'll talk about. Again,
its--we believe that many of the reports on the size of botnets that you've seen before
are inflated, you know, probably an order of magnitude more than what they really are
and I'll tell you why we believe that. So, there's a time stamp, the IP address, proxy
ports, the operating system version, the locale, this is a unique ID which I'll tell you more
about. And basically, it was that that we use to find out how many unique machines we
had as opposed to unique IP addresses. And then what was the Torpig build and what was
the version number. Okay? So, I said these guys were professional software developers.
They did all the good things you should do if you--you know--if you're developing software.
So, let's talk about the network ID. It's an eight-byte value. It's used for encrypting
the header and the data. And it's derived from hard disk information or a volume serial
number. And so as it turns out, we can also tell whether or not--wait a minute, down here,
we could detect when there's a VMware machine because then it always has that same number
on all of them, okay? And so, you know, unless someone went to the effort to go and change
that. And so, this serves as a convenient, unique identifier. The body then--the optional
body, if it has any stolen information to send, has any newly compromised account, anything
that was posted. Everything that was posted was sent. Okay? So, if you're doing your email,
you know, your web email through posting, we got all of that. Okay. Okay, and so when
it sends this, the nid is in the clear, that's used to encrypt the header. And so, once you
know the encryption algorithm, it's fairly easy to decrypt all the messages. And--I should
have mentioned that before. For Torpig, the encryption algorithm wasn't very sophisticated
at all. For Mebroot, they use a--we hope they use a very sophisticated encryption algorithm,
because no one's been able to break it yet. Okay, and obviously they're two different
ones, okay? Next please, okay, so, I was telling you about size estimation. This is the count
of number of infections. Normally, when you read about botnets, it's usually based on
the unique IP addresses. Okay? It turned out--this is problematic for a number of reasons, if
you have DHCP, DHCP connections, you know, those in theory could change every time that
you log on or more often, you'll see. If you're like me, I found that when I was on Verizon
getting DHCP, I think I had the same IP address for that four years or whatever I was on Verizon,
you know. And so--and I checked the machine down, you know, pretty regularly, so. And
so, you know, most people will say, "Well, why is that a big deal?" Well, for some place,
it--places, it is and in particular, in Germany--Germany and Italy and Bellsouth or people who--have
a short time to live on it, and also, changing almost every time you're connecting. Okay,
for our account, we based it on the header information. We based it basically on that
nid that I told you about. We found--because we thought that should be unique. Okay, since
it was, it was based on a skuzzy drive and if that wasn't there, on some other software
that should be unique. We found that we had a few--and when I say a few, I don't remember
the exact number but it was like in the 100s that had repeated--that value was repeated,
and we could tell by from the location and so forth that it shouldn't be. And so we used
the nid, we used the country of origin, the locale and something else in those, we're
convinced it was unique. So here, it shows you new Torpig IPs per hour. You can see--Oh,
I don't know if I mentioned this here. Let me see. We saw 1.2 million unique IPs and
we saw 180,000 unique hosts. So, as I said, in order of magnitude difference there. Here's
the unique IPs. You can see we got this big spike. This is Saturday. It was midnight on
January 25th that we took it over. So, we got a little trickling of things beforehand.
Remember, if we went back a week before we already had 300 and some almost 400 connected
to us. But, we already up here--are above 14,000, something like 14,500 on the initial
connection. One thing to note about this is this every two days here, but you'll see this
is diurnal in terms of people connecting, you know, eight o' clock in the morning. There's
a lot of these that are commercially-owned machines. And, you know, around 8:00 o' clock
you'll see a big influx of people logging on, it drops down at night and it goes up,
and so, this is work that David Deegan in Georgia Tech had noticed before too is this
diurnal effect and that definitely was here. Over on, on--yes?
>> [INDISTINCT] >> KEMMERER: What's that?
>> [INDISTINCT] >> KEMMERER: Oh, you could--well, you could
see things going across, but if you take Western Europe and the US and sort of collapse that
in, you know, the effect of there is close enough that you'll still see it. Yeah. This
is on--these were based on where we were collecting the data. Okay, but you're right, I mean,
you know, it was fun, you know, because we didn't know how well this is going to work.
At midnight, we're sitting there watching soccer. And you can just see watch it coming
across. And the other thing was--that was--remember, that was Saturday at midnight, and then we
had, you know, another spike which is probably maybe this one here, it was then on Monday
morning. Okay, because there were people who weren't connecting on Sunday, you know, and
they started then connecting Monday morning. But, yes, so to answer your question, that's
kind of mixed in there, but you still got it, even though you had the combination all,
okay? Because it was basically a--you know, a nine-hour worst case difference between
those. And it's moving itself out. And you know, not everybody get starts at 8, some
start at 10, you know, the whole thing, okay? So, here's the--using our host nids for it.
And again, you'll see there was a--you know, big spike in the beginning. But what you notice
here is that it--you know that it drops off. And we don't get that many--as many unique
ones. If you go to [INDISTINCT] so also, the average number of new IPs per--you know I'm
trying to remember whether that was per hour. Well, this one's per hour. So, the average
number--yeah, that would be from per hour. The average number of new IPs per hour was
4,690 whereas for the new unique hosts was 7,500. Okay, let's go to the next one. Other
interesting thing here if you look at the cumulative number of infections, it's linear
for unique IP addresses. So, it can, you know, it just keeps, keeps going up linear like
this. In the case of when we're using a host, it turns out that we had 75% of the new IDs
in the first 48 hours. And then it--you know, and then it kind of flattens out, it doesn't
completely flatten out because there are new infections going on, all the time, but, you
know, not at this linear rate that you see here. Okay, next slide please. So, let me
talk a little bit about the different threats, the obvious threat is that, the theft of financial
data. But, I also want to say a little bit of denial of service, proxy servers, and privacy
threats, which, I have to say, are all conjecture on our part but we backed it up with, you
know, with real numbers whereas the theft of financial data, we have hard date on that
when that actually happened. Next one, please. Okay, so there were 8,310 unique accounts
from 410 different financial institutions. The top five were PayPal, Poste Italiane,
Capital One, E-Trade and Chase of--I think in a paper I might list some more--the number
six was somebody who was doing me favor and I said I wouldn't put their name in the paper
so or on the talks so I only--I cut it down from ten to five but you think of whatever
bank you're at, you know, Wachovia, Chase, Bank America, all of those, you're going to
see in here. And the other thing which I don't think I've mentioned to you before, yelling
the Torpig does, it goes to the password manager and steals passwords and from there, okay?
And so, 38% of the credentials that were stolen were stolen from the password manager, all
right? And we know that because, again, these guys--well, men and women that wrote Torpig
did a good job. They labeled it a specific way to say this came from the password manager.
So, it makes it really nice to go and analyze these data. You know, we're kind of going
like "Yeah," you know, and it's like, you know, and things we didn't even think that
they should put in there that they did like; the build number that I was telling you about.
There's a different build number. I don't remember exactly how many of them there are.
It's in our--the number is in our latest paper but let's say about 12 different ones. But
it turns out that the software where the different build numbers is exactly the same and so,
that the thing that we and other folks we've been working with our conjecturing about that
is that they're actually selling their services and they're selling it to, you know, different
ones with the distinguished by the different build numbers, okay? And there's some, you
know, other work been done that substantiates so, that's a reasonable thing then to guess.
Okay, we also got 1,600 and 60 credit cards. Top five were Visa, Mastercard, American Express,
Maestro and Discover. 49% of them were from the U.S., 12% from Italy and 8% from Spain
and then, you know, dribble off after that. Typically, we had one credit card per person
but there were exceptions--oh, I thought I had the exception in there. Somebody had 30
credit cards, they were compromised, okay? And so we looked a little more into the--see
if we could get a little more information about this particular bot and it turns it
out that it was somebody who was providing a service for other folks and holding their
credit cards for them. So, you know, if you sleep well at night, you know, you know, if
you have your wife, whether be--you know, people doing your flight reservations or whatever,
beware, okay? So, one question is well, what's the value of the financial information? Well,
Symantec in 2008 estimated that credit card value ranges from 10 cents to $25, okay? Oh
and I should mention that what--how do we get in this business, she maybe asking, okay?
Well, we have an NSF grant to study the underground economy, okay? And so, this is--what we're
talking about today is sort of the first step of that. Somebody who is collecting this up
and it turns out that the people that collect it, they're never going to get caught using
a credit card because, you know, they're selling it to someone else and when you're selling
them in bulk that's when you get this 10 cents value. If somebody is out there selling an
individual credit card because they are a waiter in a restaurant and they don't have
anything to do with cyber security. They, you know, copied it off or, you know, use
their cell phone. You don't even have to write anymore and that's closer to 25. So, anyway,
bank accounts range from $10 to a thousand dollars and so what we did is we looked at
the new accounts and the new credit cards that we got. So, we didn't count anything
twice and this graph, which I'm sure no one other than the people sitting right under
it with the bad necks, I can see--this one here is the count of new credit cards and
new accounts and this one is the max and this one is the min and the value, if you--if you
look at, take those--both those minimum values, the value over the 10-day period of the 83,000
minimum and up to 8.3 million at the maximum, okay? Assumed that you've turned it all of
those over, okay? Next. Okay, so, another threat I just want to say a little bit about
is the denial of service threat. There were more than 60,000 active hosts at any given
time and so, what we did is we used ip2location database to determine network speeds and we
found out that cable and DSL modem make up about 65% of the infected hosts, okay? And
we used the cable modem and DSL speeds in the U.S. because they're well known to be
the slowest in the world and we used that upstream bandwidth which is 435 kilobits per
second and did the math on this and this yields greater than 17 gigabits of information per
second from DSL and cable, okay? And with that, you can do a pretty good denial of service
attack on somebody if you want to aim it at the same person if you want to do. The other
thing is that the corporate networks made up about 22% of the infected host and they
have even faster speeds and so that number of set is very conservative there. So, the
possibility of using the bot for denial of service, which used to be the original use
for botnets and I remember the attack on CNN back in 2002 and so forth, there was a--just
to bring it down, the fact that it was attacked recently on--you know, it could have been
Twitter. It was a Twitter? >> Yeah.
>> KEMMERER: Okay. All right, so let's look at the next one. Oh, I told you that. Good.
Just in case I forget, you know. Okay, proxy servers, when Torpig--when a machine first
gets compromised, Torpig opens both the SOCKS and HTTP proxy. 20% of the infected machines
that are in the botnet are publicly reachable and we went to the Spamhaus blacklist and
it turns out that only 2.45% of those are on the blacklist, which means that the other
97% are usable and, you know, the 20%, that's probably reachable and these could very easily
be used for spamming, okay? And, you know, and I would venture, I guess the most of the
spam that we all get a good portion of this is done by botnets, you know, illegal botnets.
Okay, next. Okay, privacy. Remember we collect everything that's posted gets sent to the
command and control. So, that means if we're doing web mail or web traffic forum message
and so forth and so, we decided we just want to look at this a little bit. I'll tell you
later when I talk about ethics. You know, is this a real question of should you be looking
at people's mail and in fact when I contacted the DOD cipher guy--Cybercrime guy, he said,
"Yeah, I'll talk to you with that DOD information," and I said, "We do," and then he wanted it.
But he said, "I don't want any of the email," he said--he even said, "I don't want there'd
be any chance that I even have it on my stuff." And so, you know, there's a lot of problem
about that. So, we tried to do a hands-off, hands-on, I don't know--approached to getting
some information about what kind of people these machines are compromised and so, we
focused on around six and a half thousand messages that were in English because we don't
want to have to bother trying to translate them and that were 250 characters or longer
and it turns out that about 14% of those were about jobs and resumes, looking for jobs and
resumes. 7% we're discussing money--how do we know is that we took keywords that would
be obvious keywords to search for in doing this sort of thing. 7% discussed money, 6%
sport fans, 5% preparing for exams, and 4% doing partners, sex blah-blah-blah. I should
take that line out of there I guess, online, okay. Interesting thing about it is that a
lot of them are concerned about online security. You know, we saw these messages, like one
of the interesting messages said, "Yeah, I'm back online now, I had some malware but I'm
cleaned up and everything is fine now." Well, okay, yeah, sure you are, yeah right? And
10% of them specifically mentioned security and malware and as I said some of these folks
right after filling out efficient page went to whatever the financial institution isn't,
you know, and had this irate message off to them, which we also got because if we're,
you know, we're using web mail for this and it said, "Why am I doing this?" You know,
but they already did. Okay, next. Okay, so, one of the things we did since we had so many
passwords that were collected, we had almost 300,000 unique credentials. What we did is--what
can we--what can we say about these things and so we analyzed and we found out that 28%
of the victims reused their passwords on multiple domains. Probably, you know, not such a big
surprise, if you could go to the next one. We said, "Well, can we get more information
about this?" We know they're reusing them, okay? And so, we said, "Well, how strong are
these passwords?" You know, if we weren't capturing on this way, you know, if you use
the usual password guessing how strong, how strong would they be? Okay, so, we used John
the Ripper and we had to take, you know, some of these passwords and put them in a UNIX
form and in order to do that. Okay, but we used the John the Ripper to assess the strength
of the passwords and it turned out, when we did that, we had a 173,000 unique passwords
getting rid of all the duplicates. In running John the Ripper in the default mode, you know,
no dictionary or anything. >> [INDISTINCT].
>> KEMMERER: Can you hear? Did I turn it off? >> [INDISTINCT].
>> KEMMERER: Can you hear me? Hello? Should I sign? Oh, it blinks because it's bad? You'd
see, he never tell if it blink, is this good or--no, no. Am I supposed to be done by one?
I thought we started seven minutes late, we go off at seven. Okay. So, basically, we used
John the Ripper on a unique password file, 173,000 of them and we found out that--and
this was--we ran it first just in the default mode. We note no special dictionary or anything
like that and we got 56,000 of them in just over an hour in 65 minutes. We then used a
large wordlist and we got 14,000 in the next 10 minutes. So, we're talking about, you know,
an hour in a quarter and
we got 14,000. So, 40% were cracked in less than 75 minutes. We then ran it for 24 hours
and got another 30,000 and if you go to the slide, which again, you can't see. This just
gives you that information. We got a 40% of them and then the next pumped there and then
the 24 hours there. Okay, let's go the next slide. Okay. So, what about--let me say a
little bit about criminal retribution, a little bit about law enforcement. These are kind
of--maybe they should have been lessons learned instead of what about but, and repatriating
the data and ethics. So, if we go to criminal retribution. So, the biggest--big concern
on January 25th, the first day, it was midnight when we took over this thing. I wasn't at
the lab but I had a grad student reporting, regularly sending out emails. He's, "Now,
we have these many accounts, we had this much data. Wah, you know, it's like the sky is
falling, okay." So, Sunday afternoon, we were in the lab talking about what to do and my
biggest concern at that time was the criminals because these guys were known to be bad guys
or they're going to come to get us; we should offer kneecaps as I said. Can you do the next
one? More realistically, I mean I did worry about that when in fact I still worry about
this a little bit. That's why when you say, "Can we publicly post this?" You know, when
I--okay, I've given a talk in this a couple of times already and they probably know who
I am. More realistically, we're concerned that they're going to DDoS us because if you
remember, we only own the first two domains that they DDoS, our host that are rolled over
to Dot-Biz and then you can download their stuff, okay? Next. So, the biggest question
I have and I still don't have a definite answer of this is, "Why did it take them 10 days
to download a new domain generation algorithm?" Okay, because that's basically what they did.
You know, some of our thoughts and one of my students said, "Well, maybe the guy in
charge of it was on vacation," and I said, "Okay, that could be possible." I think maybe
a more possible one was they want to find out who it was that took it from them. You
know, was it another criminal group that, you know, the competition. Do you remember
the wars between the Chinese and the - I don't remember--the Russians I think a few years
ago and that hackers and so--why did it take 10 days? We know why after 10 days it was
taken down and I will show that on the next slide. So, law enforcement. The other thing,
you know, I don't know if I was more worried about criminals or law enforcement but the
law enforcement I said, you know, we didn't get any permission to do this. You know, we're
cowboys from UCSB, once you're called gauchos, right, which is Argentinean cowboy or something.
And we didn't know who to notify. More importantly, we didn't just want to notify somebody who
said--would say, "Oh, shut it down," because if we shut it down then wd.com won't work,
wd.network wont' work and then the russ--the, I almost said their name. The criminals will,
you know, own it again, okay? And so we wanted to get somebody in law enforcement who knew
what they're doing. Well, I don't know, how many people in this room would know who to
contact? One. Are you the security officer here?
>> [INDISTINCT]. >> KEMMERER: Okay.
>> [INDISTINCT]. >> KEMMERER: Okay. Once you've done that,
like now, I could give you all kinds of names. But, you know, of Sunday afternoon, we're
there trying to figure out who to contact and I, you know, I was draw on a blank. If
you go here, so then I said, "Okay, US-CERT, okay, you know, whether you think US-CERT
really does a good job or not, I figured that that might be a reasonable person to contact."
So, I went to the US-CERT site and I found out, oh great, it had a pointer to, you know,
if you've got a problem with, you know, that you discovered some security, something or
rather go here and then they gave me a form to fill out, okay? And we'll get back to you
and I go, "Yeah, that's exactly what I want." You know, very disconcerting. Finally, I thought
about--it was a guy named David Dagon. I don't know if anyone knows him. He's from Georgia
Tech and they have been doing botnet research and I knew that he had dealt with the FBI
and the FBI treasury, whoever you want to get a hold of before and so, I asked, "Does
anybody have his home phone number?" Unfortunately, someone had his cell number and called him
at home Sunday afternoon, I said, "Hey, David, you won't guess what we've been doing, you
know," and basically, he personally in touch with an FBI contact and he not only--he not
only gave us the information. He sent an introductory letter about--an introductory email about
us and so forth and then I felt better because I thought, "Okay, now I've reached out the
law enforcement." I've got a paper trail that says, "I'm trying to connect somebody." Well,
that was late Sunday afternoon. Monday morning, we still hadn't heard from them and so, I
sent a message directly to the FBI guy an email and said, "You know, we got the message
from David and we'd really like to hear from you, you know, do we have any guidance about
what we should do?" On Wednesday, we still didn't hear back and so, Wednesday, I knew
I had a friend at Citrix online in town who I know had dealt with treasury for something
like this before and so I called him and said, "Do you have a name of somebody?" And he said,
"Sorry, the guy I dealt with has his own consulting firm now," which tends to be the case with
a lot of these guys as you can well imagine and so, he checked with the privacy officer
who wound up putting me in contact with this DOD Defense Criminal Investigative Services
guy who I contacted and he immediately got back to me and which was good because now
I felt even better. And then, Friday afternoon, we're talking six days later--Friday afternoon,
I get a message from the FBI guy and said, "Oh, I got your email I'll get back to you."
I'm going "Are you kidding me?" And I have to say it when I talked to David Dagon, he
said, "Don't be surprised if they are not as excited as you are about this because,
you know, there are a lot of botnets out there, you know." I expected a little more excitement
than that. Well, 15 minutes later, we got another message from that guy and says, "I
just read the content of your email, this is great, we've been wanting to do the same
thing but we can't get permission, okay?" You know, and, "When can we talk, you know,
can we have a conference call?" And it turns out, they were really--they were really good
and they did--they knew about Torpig. You know, they had been--they had been studying
at it and so forth. They put us in contact with a group, which is a group of different
security officers around the world who had been very helpful for us to repatriate the
data, you know get it back to the appropriate banks and so forth. So, when we finally got
a hold of them, it was good and now, if I have to get a hold of law enforcement, it's
really easy but it's difficult because, you know, there's not--there's not an easy URL
to go to, okay. Okay, so I said already about the FBI. It turned out to be very good in
there. The next one please. Okay, so, as I said before, we had over 8,000 separate accounts
and 410 institutions, 1,600 credit cards and to get this, I mean we had to mind the data.
We had to figure out what their--what their formats were and so forth and to figure out
all of these. And then once you know it, like, you know, if I know I've got--how many PayPal
accounts and I said, less than 2,000 but close to 2,000. What will I do? Just pick up the
phone and call PayPal. PayPal might be okay but Bank of America, I go when I do it. You
have Bank America branch here. You have the phone number, I'll call them, "Hi, Im ***
Kemmerer, I've got a whole bunch of compromised accounts for your bank." You know, and, you
know, they can buy crazy or whatever. So, it turns out that, you know, that's really
not that hard to do and in particular when you then talk to a bank, if you say, "We may
have some of your credit cards too, can you give us your bank identification numbers,"
which are unique for every bank and that's how you can search on these things. You know,
which is the first ex-numbers of the credit card and if you say, "Can you send me your
beans[PH]?" Well, they don't want to do that. You know, they're very tight with all of that.
So, one of the nice things that happened is these FBI guys are actually in Pittsburgh
and there's this--this is a place that I was telling about the National Cyber Forensics
Training and Alliance, which is a group of a whole bunch of security officers from financial
institutions and he--they--the FBI guy put me in contact with them and basically I go
to them and they'd say, "Okay, here's who you contact for this bank, here's who you
contact for that bank." And it was really nice in some cases. We got the names of individuals
who could go and take care of everything for the whole country. You know, like, all of
Italy or all of Switzerland and so forth. They have one guy that was repatriating. Okay,
let's go to the next slide. Okay, so, if you recall, principle one said, "The sinkholed
botnet should be operated so that any harm and/or damage to victims and targets of attacks
would be minimized, the collective sensitive data that potentially could threaten the privacy
of the victim," and, you know, completely threaten them. One question which I mentioned
before, "Should emails be viewed at all?" I mean there's people that feel very strongly.
You should not look at that at all. Even in the data mining way which we are using for
those things. And then the next one. A lot of my academic colleagues after they saw our
paper asked this question--did you have--I'm not familiar with IRP, IRB, Internal Review
Board. They are the ones that--and for us in academia, whenever we put in a grant proposal,
there is this thing you check and it says, "Are you dealing with any human subjects or
anything like that?" We got, "Nah, we're not doing that, no, right?" And then of course
on this stuff, we put, "No", yeah I would say because if you say yes then they get out
this whole stack of paper that you have to fill out. Well, it turns out, we're not working
with human subjects and we didn't plan on getting these kind of data, so we said "No."
But there's this other little catch in there, any data that can be used to identify the
individual needs IRB approval. Well, I now have IRB approval. You know, unfortunately,
I got an after effect on this thing but it's something just, you know, in particular for
people that are working on this thing is, you know, is a legitimate one. You have to
do it. I won't ever put anything in again there where there's a chance I'll be getting
this data without getting IRB approval first, okay? And I still don't know about the emails
and stuff like that. I mean we're kind of backing off from that as much as possible.
So, in conclusion, we had a, you know, other people have--have gotten data on botnets.
Generally, one of them--there's one on Conficker, which is a former student of mine was doing
it who was up now at SRI International with Conficker, collected a lot of data where they
post as a command and control. But they didn't know the protocol to respond to it. So, these
places were connected once and so, if you do these passive sort of things, you get a
partial view of it. We were fortunate. We got everybody on Torpig. You know, during
that 10-day period connected to us. Oh, I thought of something I forgot, can you go
back the slide? Another slide. Okay, it's going to take too long. I wanted to say something
about the FBI. One more. Okay, FBI was very good to us. Okay. I'm going to tell you why
it was 10 days, okay? On Monday of the second week, we were having a conference call with
the FBI guy and he said, "Is there anything we can do for you?" You know, because, you
know, we're doing the usual thing, anything you want and we said, like, Giovanni Vigna
who was my colleague said, "But we only own the domains for the three weeks out, you know,
with the week and a half is almost is gone and the criminals own them for the three weeks
after that, if you could, you know, if we had a way of taking down their hosts, you
know, or the domain's names then, you know, maybe we could, you know, get in there because
we are already dealing as after they signed up for dot-com and dot-net. We signed up for
dot-biz for those three weeks, so, it would have rolled over us. So, in any way you can
knock them out." And so, "We can probably do that," and we said, "But beyond they're
registered and, you know, in these--people that are well known for not responding," and
they said, "Everything comes back for the U.S." You know, very confidently he said that.
And so, Tuesday, at about 11:30, we got an email from him and he said their domains are
gone. Okay and sure enough we checked. And their domains were gone and 45 minutes after
that, they downloaded a new Mebroot binary--a new Torpig binary on all of the machines through
Mebroot. So, it was kind of the FBI also had a bad side effect for us, you know, on that
but in that and so, in trying to answer the question of, "Why did they wait 10 days?"
I think that's sort of answers, you know, if they thought, "Hey, this is our competition
and we want to know who they are and see what they do and so, you know, we can decide how
to do it." They knew when it was taken down. Probably, that was down by law enforcement
in the side that we're going to, you know we're down with our learning curve. We're
going to take it back, you know, and so that was 10 day. Okay. So, conclusions, we had,
you know, we had lots of stuff to look at and we still have--I mean we obviously have
in mind all of those data yet. The thing about distinct IPs, you know, they're probably overestimated
by an order of magnitude. Botnet victims--our users were poorly maintained machines because
they have the vulnerability's available. They choose easily guessable passwords to protect
sensitive data and then this last one was just, you know, what I told you about this
interacting with registrars, hosting facilities, victim institutions isn't all as easy. And,
you know, obviously, I didn't do this alone. Chris Kruegel and Giovanni Vigna are two faculty
members who were the security lab advisers and these were the students who've worked
on it. Brett was the guy that noticed that it was not being used, you know, three weeks
out or whatever and then the rest have participated in their ways. And questions? That's a place
I have to live. It's really awful. I hate it. And, you know, I wish we had traffic like
I've had coming over [INDISTINCT] past this morning. Okay. Yeah, questions?
>> Yeah, [INDISTINCT]. >> KEMMERER: They went to a new encryption
algorithm before the Torpig command and control has been broken. So, one of the things that's
happened--okay, I shouldn't say what I was going to say. Anyway, there's--as you may
believe, there's a group of researchers and some folks working with particularly with
financial institutions and so forth that are interested in what's going on here in sharing
data and so forth. It turns out that the Torpig, since we did this, they've changed their encryption
algorithm several times for Torpig and it's been--it's been broken every time. Yeah, fairly
quickly. Okay? Mebroot still has it. Now, one might ask--well, if Mebroot hasn't been
broken yet--why, you know, when it's the same people. Well, this again, maybe part of the
underground cyber economy. It may be the Mebroot guys are selling their services to the Torpig
folks, okay? And Mebroot is doing their, you know, not ready to sell their, you know, encryption
that they used. >> [INDISTINCT].
>> KEMMERER: You mean just to make them feel good?
>> [INDISTINCT]. >> KEMMERER: It could be. It could be. Yeah.
Yeah. Any other questions? >> [INDISTINCT].
>> KEMMERER: It's because those weren't new ones. They were new ones hitting us in a 10-day
period. >> [INDISTINCT].
>> KEMMERER: I think that drop was if you look at the data, it was February 4th, which
was when we lost it. So, it just went down like that. It wasn't...
>> [INDISTINCT]. >> KEMMERER: Huge like in a beginning it's
because they weren't truly new. They were new to us, okay? You know, that was just like--so,
if you've got people that were infected two weeks ago or a month ago or whatever who are
still connecting to command and control or every 20, that's us. So, that was those. And
then the new ones were going across but when you saw the drop, like--like even the new
financial institutions. I don't know if you guys got the point or the paper on this. Did
you send that around or? Any way, I knew (Oris) had it because he told me he was reading it
but there's a paper and I can give you the pointer too of it if you want. It's what's
going to be and if you go to our site, you know, you see us the sec lab, it will be there.
But those--those slides are there but even on the new accounts, which was fairly flat
right from the thing. But then you see the drop because it was--that was February 4th,
okay, when we lost to botnet, yeah, yeah. And so, it didn't dropped off just like that
it. I was interested to do it because as I said we're on--we're actually on the first
year of a four-year NSF grant to look at this stuff and so, I'd like--for me, it's an advantage
to have--to know the bank people and so forth and so I used that as a quick quote quarter
to get to know those folks but it gets old after a while. You know, and I think my grad
students are tired like, oh here's a new set of bin numbers. Can you look them up? Here's
the banks of interest, you know, what domains you want and you think like somebody like
the Bank of Scotland, we said and, you know, they said, "Do you have anything?" And we
said, "Oh, yeah we have some because some it's obvious it's UBS or something like that,
United Bank in Scotland," and so we shipped that one and then they said, "Oh, here are
some more domain names that we have for Scotland. It's piled. These are the things that you're
looking for and it's--you know, it doesn't take a lot of time but everything, I mean
it do takes time. But it's not--it hasn't all been repatriated but I gave all of it
to the FBI and said, "You know, you guys deal with it if you want more so." Yes?
>> [INDISTINCT]. >> KEMMERER: Yes.
>> [INDISTINCT]. >> KEMMERER: Yes, we did some. But I don't
have anything specific to tell you about it. What I remember are there's two days--two
days that had a big spike and our conjecture was that there was some new, very popular--one
or more very popular website that have been recently compromised. Yeah, yeah. Well, it
turns out that some other research--it's not in this paper that we did is we also, you
know, when you go to the legitimate site, it downloads the HTTP tag that goes and run
some JavaScript that connects you to the real download drive. So, it turns out that that
site changes all the time, somewhat like Torpig and we figured it out that algorithm when
we actually owned that site for a while. So, we knew which sites were infected. You know,
initially, which legitimate sites which is nice information that we've tried to get back
to these folks with also. But that was--that's a game that was very hard to play it was like
everyday, daily. Anything else? Okay. Thank you.