How to Steal a Botnet and What Can Happen When You Do

>> Thank you everyone for coming. I know we're normally in a different [INDISTINCT] room but we're kind of, so [INDISTINCT]. Today, we're pleased to welcome *** Kemmerer from UC Santa Barbara to talk about his research into botnets. I think it's a [INDISTINCT] botnet? >> KEMMERER: Yup. >> Right. And there's [INDISTINCT]. Anyways, please give him a round of applause. >> KEMMERER: Okay. Thank you for having me. I love your conference room. It's just really state-of-the-art. I'd expect nothing better for Google. If you want to change the slide, so--I'm not at the slide changer, so I'm going to, you know, regularly be saying, "Please change the slide." So, I though first, just overall on the same page, I'd just go over some terminology real quick. I assume most of you know all of this. But at least we're using the same names. Bot's an application that performs some actions or set of actions on behalf of a remote controller. It gets installed on a victim's machine. In the old days, we used to call these zombies. So, once you have a compromised machine that someone is working--using remotely, that's what we refer to as a bot. It's modular in a sense that you can plug in different kinds of malware for doing different exploits and so forth. And then, today, we'll be talking about Torpig malware. When you have a collection of these infected machines that are controlled by, usually single individual, it's referred to as a botnet. The way of controlling it, it is by means of a control channel which require--is required to send commands to the bot and also to receive data back from the bot. Originally, most of these control channels were IRC channels. Today, we are seeing more of them being HTTP and HTTPs, and some of more actually Peer-to-Peer. In course into Peer-to-Peer then you don't have the central control that you do in the others. The person in charge of this is usually referred to as a bot herder or the botmaster or the controller. He owns or controls the channel, sends the commands to each of the bots and collects the data from the botnet army. Usually the motivation is power or money. The motivation used to be, put another notch in your belt. Today, it's very, very much money driven. Yes, I--how about if I put a red dot on you when I want you to change it. Okay, next please. Okay, so let's talk a little bit specifically about Torpig. Before I--before I do that, I want to just say, Torpig is actually delivered by another malware called Mebroot. But I'm only going to tell you about those two lines right there about Mebroot. So anyway, Torpig is distributed via Mebroot. Mebroot's really a malware platform. It provides you, kind of like with a middlewares. So, you can install different applications and so forth. For Torpig, it injects itself into 29 different applications... >> [INDISTINCT] >> KEMMERER: It's what? Okay, sorry. It injects itself into 29 different applications. These are all sorts of applications, like web browsers and Skype and so forth. It steals sensitive information. It's one that had been supposedly called one of the most dangerous botnets around, it steals, of course, passwords, HTTP POST, bank credentials, logins to all different kinds of accounts and so forth, and we'll give you some data on that a little bit later. It uses HTTP injection for phishing, which is actually web in a browser phishing. So, it's not detectable by any of the current phishing detection software, and it uses something called domain flux to locate the C&C server which is changing regularly and I'll talk more about that specifically about those last two's, three and more specifically in a little bit, just to say something about Mebroot. Mebroot is spread via drive-by downloads and it's basically a sophisticated rootkit that overwrites the master boot record. And then, once you--after it overwrites the master boot record, it waits a little while and then it reboots the system, and then it's doing everything at boot time, and then, again, it's not detected. Okay, oh yeah, this one is going to be fun. One more time--okay, so, basically this is a pretty sophisticated group of people. I don't know that they're sophisticated, their software's rather sophisticated. The architecture, the system is rather sophisticated. The one thing--two things on this slide that are not things that the Torpig or Mebroot folks did is up here where we're talking about hacked web servers. So, these are just the normal legitimate servers that are out there, of course they're vulnerable and so, the Mebroot folks have hacked into them and put software on them so that we can get the whole process started, which I'll tell you about in a little while. I shouldn't say all of these are innocent because a large number of them are actually the *** sites and the belief is that the *** sites are probably getting paid to have this software put on, on there. There's the innocent victim here. Other than that, we've got the drive-by download server, which initially puts the Mebroot software on your system to try and see whether or not you're vulnerable. We've got the Mebroot C&C which controls the Mebroot itself. We have the Torpig command and control which controls Torpig and then a separate injection server. So, let's see how this works. We start of with the innocent victim connecting to one of these legitimate sites, one more please. When it connects to that site the iframe that comes back down has a dTag on it that causes the innocent victim to go over to the drive-by download server, okay? So, it goes over to drive-by download server. The drive-by download server goes in--downloads the Mebroot software. The Mebroot software looks for a number of vulnerabilities in the system, if it--if any of those vulnerabilities are there, then the innocent victim now becomes a bot. They just love those devils, huh? Okay, well, I should say, this slide was actually--there's a figure we have in the paper that you guys had a reference too. And thise slide was done by an FBI guy, Forison, and then--well I change it some, because he had UCSB looking like angels and we, we didn't feel like we wanted that knowledge. But I wanted to acknowledge the FBI for doing the animation here. Okay, so now that the--that we have the bot here, the first thing it does is connects to the meb--oh, yeah, go ahead. Is it much easier when I'm doing the finger things then I know. Okay, so, it basically connects the--as soon as it's infected, it connects to Mebroot C&C. Now, as I said Mebroot is a--can be used for other malware. In this case, it's being used for delivering the Torpig malware and it basically downloads three Torpig modules on to the innocent victim's machine. These modules connect to the Torpig command and control every 20 minutes. And one of the other things that--so once it connects the Torpig command and control, first thing Torpig command and control does is download the configuration file. This configuration file--I'm about eight slides ahead, but it will all come back, you know, if you hear it twice then you'll remember it, okay? The configuration file has about 300 financial institutions on it. Such that when you visit those financial institutions, it's going to do a phishing attack on you. The other thing that happens is that every 20 minutes when the victim connects to Torpig, it gives it any new stolen data that it has. Okay, so I was telling you about those financial institutions, whenever the victim goes to one of those financial institutions, Torpig connects to the injection server--oh, okay, I was at two. I didn't know if that was a two-way arrow or one-way arrow--connects to the Torpig injection server. The injection server--through a process which I'll detail later--gives it a particular customized phishing page for that particular financial institution. And that--it looks exactly like all their other pages do. Like the one for PayPal looks like a piece of crap. And I shouldn't say that since you're going to distribute this. But yes, it does. The other ones look kind of nice like they were prepared by professional people. And I'll show you examples of that. So, this is basically how things are working when Torpig is owned by the criminals. Every 20 minutes it's connecting here, giving it any new stolen data. If the center--command and control has any new commands, it downloads those. Every two hours, it connects to Mebroot to see whether Mebroot has anything to do, okay? So, what did we do? So, we're the UCSB Gauchos. That's actually the--what do you call it? The mascot for UCSB, which I thought was kind of nice for us, although it's a black cap not a gray hat. So, what we did is we put a vulnerable machine out on the net, it got infected by Mebroot and Torpig. And of course we were running it in VMware and sampling everything it was doing. Reverse engineered the software, we broke the encryption that was going on. And again, I'll give you all the details of this later. And basically, we took over, so that--what was happening is the innocent victim was going to start delivering to us and getting commands from us. And the--and more importantly, it was not sending the stuff to the criminals. And so, basically for 10 days, this is a position we were in, we had 100--over 180,000 unique machines connecting to us from over 1.2 million unique IPs which we'll talked about why is that different in a little while too. And so, we have all the information. Remember I said that there is a connection here every two hours to Mebroot command and control. And we fully expected that they would download some new information that would kick us off. This didn't happen for 10 days anyway. And I'll tell you why it did happen after 10 days. Also, we thought it would maybe come out as some other ways, I think that's the end of this--oh, so that goes away and it goes back to them. And, let's go to the next slide. So, let me talk a little bit about how the phishing page works. So, the--when the configuration file that is downloaded from Torpig and can get updated regularly if you want, well, it turns out it doesn't get updated that often. But, it has domains of interest. In our case, we're approximately 300 financial institutions. And, you know, you tell me what financial institution you were with it, it was probably on there. And so, countries all over the world, not--I don't know if we had any in China, but we had, you know, Australia, New Zealand, the UK, Europe, and all over the US and so forth. Okay. So, what happens when one of this domains if interest is visited, the Torpig software issues a request to the injection server. The injection server specifies a particular trigger page on that domain, and the idea is that--and it also gives a URL to the Torpig software. When the user of the--of that particular bot gets to that particular trigger page, the Torpig sends the UR--invokes the URL. What the URL does is downloads a page... >> [INDISTINCT] >> KEMMERER: Yeah. >> [INDISTINCT] >> KEMMERER: Yeah, it's a problem from--I'd like to step away, away from them like, but, I don't know if that's okay with other people. >> [INDISTINCT] >> KEMMERER: Okay, so--so basically when it gets to the trigger page, the Torpig invokes the URL, gets a phishing page back from the injection server. And what it does through a man in the browser attack, it goes in and displays that page for the user to look at. Okay? Let's see, what I want to say, so that phishing page reproduces the look and style of the target website. So, here's an example page for Wells Fargo, part of the page anyway. Where you can see it--it looks just like Wells Fargo. If you bank at Wells Fargo that looks it, like what your bank pages normally looks like. In fact, you know, one of the things you're told to do is look up here to see whether or not if it says Wells Fargo and then all at the end, it--at the end, it says dot-dot, you know, dothacker at a bad domain or something. But it has a right domain there. It's doing SSL, it all looks legitimate and it's not caught by any of the phishing detectives. Of course, it asks for things like first name, last name, date of birth, social security number, mother's maiden name. If you go to the next one, just so we don't--this is one for Bank of America and it has some more questions there which I can't read from here and I'm sure you can't either. It says, oh, your father's middle name and what city was your mother born, all those sorts of things. Once you fill these out, you basically, you know their, their--your right for identity theft, okay? And people would fill them out. The interesting thing was, they would fill them out, some of them, we saw, they would fill them out and then they will send a message to the PayPal security guy, and say, "Why are you asking for this stuff? I though I gave all this to you before." But, they did it after they filled it out. So, it's, you know, kind of interesting. Okay, next. Okay, so let's talk about domain flux. Domain flux is the way that you find out what command and control server to connect to. So first, let me give you a little bit of history, when--if you're looking to take down a botnet, first thing you want to do is either get one of your bots, one of your machines to be infected like we did, or you find infected machines out there. And if you find the infected machines you can take them off. Okay, you can cleanse them and fix all the vulnerabilities. But if you take one machine out or 10 or 15 or 200 out of 180,000, you don't have much effect. So what you really want to do is you want to go for the command and control servers, okay? And so, if you use--if the command and control server uses a static IP address, then you can block or remove that host. So you can't go to that domain or if you're a law-enforcement, you can go there and physically capture the machine. And they actually had done that in the past for one of the Torpig machines where they actually captured one and got a lot of, like, a lot of data on it from that and found out who was compromised. Okay? So, but, remember this is an arms race. You know, it's the hackers against the good guys. And it's constantly--you've start thwarting, you know, the common way that they're doing things and they raise the bar. Okay? And so, the first thing they did to raise the bar was something called fast flux. And with fast flux, you have the same domain name all the time, but where it goes to changes, okay? And in this case, you just block the domain name, all right? What they did in Torpig, and they actually do this in conficker also. And there were some earlier bots that did this. The idea is that the bots periodically generate a new command and control to go to. So, whether the software that's downloaded on the bot has a domain generation algorithm downloaded on it. In the case of Torpig, it changes once a week. For these domains, as I say, they change often, they use a local date system time as input. And what you do as a botmaster, then just, needs to register this domain. When it changes, all of the information starts coming to the botmaster as it rolls over, okay? What happens is if you want to defend against this, you have to go and figure out all the possible domains that they could be registering, register them before them, before the botmaster does and then you can take it over. And you'll see that that's what we did, but it's not as easy as it sounds. Next please. Okay, so let's talk explicitly about domain flux for Torpig. As I said, each bot has a domain generation algorithm on it. We reverse engineered this domain generation algorithm. It also has three fixed domains that it goes to if everything else fails. Now, for the domain generation algorithm, it does two things. One is, it generates a weekly domain name, which I'll refer to as WD. It also at a daily rate if it needs to, it generates a daily domain name. And in every 20 minutes, the bot attempts to connect in Torpig--in order to the weekly domain name.com, the weekly--and if that fails, and what I mean by if it fails, if that domain's not available, it's not out there, or if it connects to that domain, and that domain doesn't respond in an appropriate way, it doesn't have the right protocol, then it will go to wd.net. If that fails, it will go to--roll over to wd.biz. If all three of those fail, then what it does is it tries a daily domain and it generates a daily domain. It tries dd.com, if that fails it goes to dd.net, if that fails it goes to dd.biz, if all of those fail, then it tries three fixed domains that are hard-coded in the software, okay? So in the past while we were watching Torpig, it turns out that the criminals normally registered wd.com and sometimes wd.net. Okay? So, which--so, they're interested in efficiency. First one, it's going to--they get. And so--if you go on--so what we did is we wanted to get them to come to us instead of going to the criminals, okay? And so, first thing we did is reverse engineered the name generation algorithm and figured out what the command and control protocol was. And as is always the case, one of our grad students noticed that the criminals hadn't registered wd.com or wd.net or any of those for about three weeks out from what the current date was. So, we went and registered those domain names, okay? So, we registered wd.com and wd.net, okay? The next, it was either later that day or the next--and we registered them for the next, for three weeks out, from January 29th to the--whatever three weeks out was, fifth--22nd or something like that, February 22nd. Oh, here, I can read what's on my slide then I'll know it, okay? So, we registered these domains ourselves, wd.com, wd.net, either the same day or the next day the criminals registered wd.biz, okay? So, we figured what they were doing to do, was going to do a denial service on our two names. So, it would rollover to wd.biz, you know, and then download new software and we'd be out of businesses, okay? It turns out that didn't happen for 10 days anyway. What did happen is on February 4th through Mebroot, which it connects to every two hours; they push the new Torpig binary which had a new domain generation algorithm. That domain generation algorithm was--also one that was dynamic, so you couldn't calculate it out weeks ahead of time like we did. The new domain generation algorithm was based on the highest hit Twitter for that particular day. And so, you--and so you had a--you know, had to be kind of on top of it. And you could believe we were talking to the Twitter folks to go and see if we can get acknowledge first, and we actually did later on, but, that's another talk. Okay, so basically we control the botnet for 10 days, we got about just sort of 9 gigabits of Apache logs, we got 69 gigabits of pcap data and I'll tell you more about those in a minute. So, let me just say a little bit about how we did our Sinkholing, we purchased hosting from two different hosting providers and there's hosting providers that are known to be non-responsive out there. I mean, you might ask, "Hey, if you know the command and control is here, why don't you just call a hosting provider and say, hey, take that guy down his a bad guy?" Well, these guys get people to go and let them host them and so forth, because they're--because they are unresponsive. And those are well-known. So we went to two different ones, because we wanted redundancy here. We also registered wd.com and wd.net with to different registrars, and this was actually very beneficial because the 7th day that we owned it, evidently on the 6th day, some Spanish bank reported to the registrar that we were doing bad stuff. And they tried to call us, but of course we've given them a bad phone number I mean, we were buried through about four levels, because we didn't want to get our knee cap shot off or anything, you know, when this came down. And so we found out too late, but they suspended us on January 31st, but we had the backup ones, so we didn't lose any data at all. So that redundancy really worked out. We set up Apache web servers to receive the bot request, recorded all of the network traffic and the other thing we did is we automatically downloaded and removed the data from our hosting providers, we encrypted it with AES, downloaded it, and so that if anybody took over the physical machines or actually broke into the machines, that data won't be available for them. Next please. Oh, interesting thing is that, as I said, we were set to take over on January 29th, while a week earlier, we enabled our software on the host, our command and control stuff to collect this a week earlier and immediately 359 infected machines connected to us. So, these guys had their clock set at least a week earlier and thought that they were already supposed to be reporting to that particular domain. That was kind of nice because we knew at least it would--you know, it would probably work. We didn't know how well it would just work--would work, however. Next please. Okay, so before I talk more about the kinds of data we got and so forth. Let me just say, you know, when you're collecting this kind of information or you had the potential to collect this information, you've got to be very careful about what you do with it. And so we basically had two principles that we set for ourselves in terms of the data collection. And the first one said the sinkholed botnet should be operated so that any harm and/or damaged to victims and targets of attacks would be minimized. So we didn't, you know, we got somebody's compromised already. You know, and we don't want to make things worse for them, okay? So what we did if you hit--so, what we did is, first, remember I said, when they connected the command and control, you have to do the right protocol back to them. And what we did was--we did what we call the okn message. The okn message is like a null message. It's just basically says, "Yup, I'm the guy you want to be talking to and I don't have anything new for you." We've had people since our paper was released on the web say, "Well, why didn't you put a new blank configuration file?" Because remember, the configuration file contains the names of the financial institutions that whether they were going to do the new phishing attack for, okay? Well, we don't know what the side effects of that would be, okay? And we wanted to, you know, we didn't do this stuff just because we wanted to be good people, we also didn't want to go to jail. You know, if you start sending things down to their machine telling it to do different--do things. I mean, we were just collecting. We were a passive collector except for the okn message. But, we were really concerned of--suppose the compromised machine is something like a bank. And when you send down a blank configuration file, it has a side effect that, you know, turns off all of the life-support things or something or, you know. Probably, it won't be that bad, but one never knows. And so, we didn't do that. As I mentioned before we removed the data from the servers regularly, we stored the data offline in encrypted form. In fact, one of the things I did at first Sunday after we took it over, we send someone out to Costco to buy the biggest offline data storage device that we could and so we could, you know, store this stuff in there and lock it in a safe, so we would be good citizens, okay? Principle number two, says, the sinkholed botnet should collect enough information to enable notification and remediation of affected parties. And so, we've worked to with law enforcement in particular with the, with the FBI and Department of Defense cybercrime units. I'm going to say more about that later. Trying to get to do an initial contact with law enforcement isn't as easy as one might think it is. You don't dial 911, okay? And so, I'll tell you more about that later. As a result of dealing with the FBI in particular, we put in contact with a bunch of bank security officers. I know the chief security officers at banks that I never knew existed before, you know, or how could Scotland have that many different banks, you know. And so it was kind of interesting, and we also worked with the ISPs which we also found they're, you know, trying to work with an ISP where you're cold calling and have been saying, "Hey, you know, you really should take this guy off or you should leave us on or whatever" isn't always the easiest thing to do, okay? So, let's talk about the data. As I mentioned before, the bot connects to the Torpig command and control every 20 minutes. It does this by an HTTP POST. It sends a header. The header is encrypted. The header has in it a timestamp and that timestamp is a time that initially got infected. And it's little things like this that the Torpig folks did for us that made it really easy to find out, is this a newly infected machine? Or has this been around? So you can get an idea of, you know. Because if you jump in and take over on January 29th and it's sending stuff to you while all those machines have been infected, one would think for quite a while before that. So you should get a big spike in the beginning, which we did. But, you know, what you want to know is how many are getting infected each day, you know, of the 10 days we owned it. And that timestamp was one way that that worked. The IP address, this is interesting. You'll see that we had, we had one--we had one machine that had 700--it was either just short of 700 or just over 700 different IP addresses over the 10-day period. And if you see how many 20-minute connections there are in that, it was almost like you had a different IP every 20 minutes, okay. And so, this was something that was interesting which I'll talk about. Again, its--we believe that many of the reports on the size of botnets that you've seen before are inflated, you know, probably an order of magnitude more than what they really are and I'll tell you why we believe that. So, there's a time stamp, the IP address, proxy ports, the operating system version, the locale, this is a unique ID which I'll tell you more about. And basically, it was that that we use to find out how many unique machines we had as opposed to unique IP addresses. And then what was the Torpig build and what was the version number. Okay? So, I said these guys were professional software developers. They did all the good things you should do if you--you know--if you're developing software. So, let's talk about the network ID. It's an eight-byte value. It's used for encrypting the header and the data. And it's derived from hard disk information or a volume serial number. And so as it turns out, we can also tell whether or not--wait a minute, down here, we could detect when there's a VMware machine because then it always has that same number on all of them, okay? And so, you know, unless someone went to the effort to go and change that. And so, this serves as a convenient, unique identifier. The body then--the optional body, if it has any stolen information to send, has any newly compromised account, anything that was posted. Everything that was posted was sent. Okay? So, if you're doing your email, you know, your web email through posting, we got all of that. Okay. Okay, and so when it sends this, the nid is in the clear, that's used to encrypt the header. And so, once you know the encryption algorithm, it's fairly easy to decrypt all the messages. And--I should have mentioned that before. For Torpig, the encryption algorithm wasn't very sophisticated at all. For Mebroot, they use a--we hope they use a very sophisticated encryption algorithm, because no one's been able to break it yet. Okay, and obviously they're two different ones, okay? Next please, okay, so, I was telling you about size estimation. This is the count of number of infections. Normally, when you read about botnets, it's usually based on the unique IP addresses. Okay? It turned out--this is problematic for a number of reasons, if you have DHCP, DHCP connections, you know, those in theory could change every time that you log on or more often, you'll see. If you're like me, I found that when I was on Verizon getting DHCP, I think I had the same IP address for that four years or whatever I was on Verizon, you know. And so--and I checked the machine down, you know, pretty regularly, so. And so, you know, most people will say, "Well, why is that a big deal?" Well, for some place, it--places, it is and in particular, in Germany--Germany and Italy and Bellsouth or people who--have a short time to live on it, and also, changing almost every time you're connecting. Okay, for our account, we based it on the header information. We based it basically on that nid that I told you about. We found--because we thought that should be unique. Okay, since it was, it was based on a skuzzy drive and if that wasn't there, on some other software that should be unique. We found that we had a few--and when I say a few, I don't remember the exact number but it was like in the 100s that had repeated--that value was repeated, and we could tell by from the location and so forth that it shouldn't be. And so we used the nid, we used the country of origin, the locale and something else in those, we're convinced it was unique. So here, it shows you new Torpig IPs per hour. You can see--Oh, I don't know if I mentioned this here. Let me see. We saw 1.2 million unique IPs and we saw 180,000 unique hosts. So, as I said, in order of magnitude difference there. Here's the unique IPs. You can see we got this big spike. This is Saturday. It was midnight on January 25th that we took it over. So, we got a little trickling of things beforehand. Remember, if we went back a week before we already had 300 and some almost 400 connected to us. But, we already up here--are above 14,000, something like 14,500 on the initial connection. One thing to note about this is this every two days here, but you'll see this is diurnal in terms of people connecting, you know, eight o' clock in the morning. There's a lot of these that are commercially-owned machines. And, you know, around 8:00 o' clock you'll see a big influx of people logging on, it drops down at night and it goes up, and so, this is work that David Deegan in Georgia Tech had noticed before too is this diurnal effect and that definitely was here. Over on, on--yes? >> [INDISTINCT] >> KEMMERER: What's that? >> [INDISTINCT] >> KEMMERER: Oh, you could--well, you could see things going across, but if you take Western Europe and the US and sort of collapse that in, you know, the effect of there is close enough that you'll still see it. Yeah. This is on--these were based on where we were collecting the data. Okay, but you're right, I mean, you know, it was fun, you know, because we didn't know how well this is going to work. At midnight, we're sitting there watching soccer. And you can just see watch it coming across. And the other thing was--that was--remember, that was Saturday at midnight, and then we had, you know, another spike which is probably maybe this one here, it was then on Monday morning. Okay, because there were people who weren't connecting on Sunday, you know, and they started then connecting Monday morning. But, yes, so to answer your question, that's kind of mixed in there, but you still got it, even though you had the combination all, okay? Because it was basically a--you know, a nine-hour worst case difference between those. And it's moving itself out. And you know, not everybody get starts at 8, some start at 10, you know, the whole thing, okay? So, here's the--using our host nids for it. And again, you'll see there was a--you know, big spike in the beginning. But what you notice here is that it--you know that it drops off. And we don't get that many--as many unique ones. If you go to [INDISTINCT] so also, the average number of new IPs per--you know I'm trying to remember whether that was per hour. Well, this one's per hour. So, the average number--yeah, that would be from per hour. The average number of new IPs per hour was 4,690 whereas for the new unique hosts was 7,500. Okay, let's go to the next one. Other interesting thing here if you look at the cumulative number of infections, it's linear for unique IP addresses. So, it can, you know, it just keeps, keeps going up linear like this. In the case of when we're using a host, it turns out that we had 75% of the new IDs in the first 48 hours. And then it--you know, and then it kind of flattens out, it doesn't completely flatten out because there are new infections going on, all the time, but, you know, not at this linear rate that you see here. Okay, next slide please. So, let me talk a little bit about the different threats, the obvious threat is that, the theft of financial data. But, I also want to say a little bit of denial of service, proxy servers, and privacy threats, which, I have to say, are all conjecture on our part but we backed it up with, you know, with real numbers whereas the theft of financial data, we have hard date on that when that actually happened. Next one, please. Okay, so there were 8,310 unique accounts from 410 different financial institutions. The top five were PayPal, Poste Italiane, Capital One, E-Trade and Chase of--I think in a paper I might list some more--the number six was somebody who was doing me favor and I said I wouldn't put their name in the paper so or on the talks so I only--I cut it down from ten to five but you think of whatever bank you're at, you know, Wachovia, Chase, Bank America, all of those, you're going to see in here. And the other thing which I don't think I've mentioned to you before, yelling the Torpig does, it goes to the password manager and steals passwords and from there, okay? And so, 38% of the credentials that were stolen were stolen from the password manager, all right? And we know that because, again, these guys--well, men and women that wrote Torpig did a good job. They labeled it a specific way to say this came from the password manager. So, it makes it really nice to go and analyze these data. You know, we're kind of going like "Yeah," you know, and it's like, you know, and things we didn't even think that they should put in there that they did like; the build number that I was telling you about. There's a different build number. I don't remember exactly how many of them there are. It's in our--the number is in our latest paper but let's say about 12 different ones. But it turns out that the software where the different build numbers is exactly the same and so, that the thing that we and other folks we've been working with our conjecturing about that is that they're actually selling their services and they're selling it to, you know, different ones with the distinguished by the different build numbers, okay? And there's some, you know, other work been done that substantiates so, that's a reasonable thing then to guess. Okay, we also got 1,600 and 60 credit cards. Top five were Visa, Mastercard, American Express, Maestro and Discover. 49% of them were from the U.S., 12% from Italy and 8% from Spain and then, you know, dribble off after that. Typically, we had one credit card per person but there were exceptions--oh, I thought I had the exception in there. Somebody had 30 credit cards, they were compromised, okay? And so we looked a little more into the--see if we could get a little more information about this particular bot and it turns it out that it was somebody who was providing a service for other folks and holding their credit cards for them. So, you know, if you sleep well at night, you know, you know, if you have your wife, whether be--you know, people doing your flight reservations or whatever, beware, okay? So, one question is well, what's the value of the financial information? Well, Symantec in 2008 estimated that credit card value ranges from 10 cents to $25, okay? Oh and I should mention that what--how do we get in this business, she maybe asking, okay? Well, we have an NSF grant to study the underground economy, okay? And so, this is--what we're talking about today is sort of the first step of that. Somebody who is collecting this up and it turns out that the people that collect it, they're never going to get caught using a credit card because, you know, they're selling it to someone else and when you're selling them in bulk that's when you get this 10 cents value. If somebody is out there selling an individual credit card because they are a waiter in a restaurant and they don't have anything to do with cyber security. They, you know, copied it off or, you know, use their cell phone. You don't even have to write anymore and that's closer to 25. So, anyway, bank accounts range from $10 to a thousand dollars and so what we did is we looked at the new accounts and the new credit cards that we got. So, we didn't count anything twice and this graph, which I'm sure no one other than the people sitting right under it with the bad necks, I can see--this one here is the count of new credit cards and new accounts and this one is the max and this one is the min and the value, if you--if you look at, take those--both those minimum values, the value over the 10-day period of the 83,000 minimum and up to 8.3 million at the maximum, okay? Assumed that you've turned it all of those over, okay? Next. Okay, so, another threat I just want to say a little bit about is the denial of service threat. There were more than 60,000 active hosts at any given time and so, what we did is we used ip2location database to determine network speeds and we found out that cable and DSL modem make up about 65% of the infected hosts, okay? And we used the cable modem and DSL speeds in the U.S. because they're well known to be the slowest in the world and we used that upstream bandwidth which is 435 kilobits per second and did the math on this and this yields greater than 17 gigabits of information per second from DSL and cable, okay? And with that, you can do a pretty good denial of service attack on somebody if you want to aim it at the same person if you want to do. The other thing is that the corporate networks made up about 22% of the infected host and they have even faster speeds and so that number of set is very conservative there. So, the possibility of using the bot for denial of service, which used to be the original use for botnets and I remember the attack on CNN back in 2002 and so forth, there was a--just to bring it down, the fact that it was attacked recently on--you know, it could have been Twitter. It was a Twitter? >> Yeah. >> KEMMERER: Okay. All right, so let's look at the next one. Oh, I told you that. Good. Just in case I forget, you know. Okay, proxy servers, when Torpig--when a machine first gets compromised, Torpig opens both the SOCKS and HTTP proxy. 20% of the infected machines that are in the botnet are publicly reachable and we went to the Spamhaus blacklist and it turns out that only 2.45% of those are on the blacklist, which means that the other 97% are usable and, you know, the 20%, that's probably reachable and these could very easily be used for spamming, okay? And, you know, and I would venture, I guess the most of the spam that we all get a good portion of this is done by botnets, you know, illegal botnets. Okay, next. Okay, privacy. Remember we collect everything that's posted gets sent to the command and control. So, that means if we're doing web mail or web traffic forum message and so forth and so, we decided we just want to look at this a little bit. I'll tell you later when I talk about ethics. You know, is this a real question of should you be looking at people's mail and in fact when I contacted the DOD cipher guy--Cybercrime guy, he said, "Yeah, I'll talk to you with that DOD information," and I said, "We do," and then he wanted it. But he said, "I don't want any of the email," he said--he even said, "I don't want there'd be any chance that I even have it on my stuff." And so, you know, there's a lot of problem about that. So, we tried to do a hands-off, hands-on, I don't know--approached to getting some information about what kind of people these machines are compromised and so, we focused on around six and a half thousand messages that were in English because we don't want to have to bother trying to translate them and that were 250 characters or longer and it turns out that about 14% of those were about jobs and resumes, looking for jobs and resumes. 7% we're discussing money--how do we know is that we took keywords that would be obvious keywords to search for in doing this sort of thing. 7% discussed money, 6% sport fans, 5% preparing for exams, and 4% doing partners, sex blah-blah-blah. I should take that line out of there I guess, online, okay. Interesting thing about it is that a lot of them are concerned about online security. You know, we saw these messages, like one of the interesting messages said, "Yeah, I'm back online now, I had some malware but I'm cleaned up and everything is fine now." Well, okay, yeah, sure you are, yeah right? And 10% of them specifically mentioned security and malware and as I said some of these folks right after filling out efficient page went to whatever the financial institution isn't, you know, and had this irate message off to them, which we also got because if we're, you know, we're using web mail for this and it said, "Why am I doing this?" You know, but they already did. Okay, next. Okay, so, one of the things we did since we had so many passwords that were collected, we had almost 300,000 unique credentials. What we did is--what can we--what can we say about these things and so we analyzed and we found out that 28% of the victims reused their passwords on multiple domains. Probably, you know, not such a big surprise, if you could go to the next one. We said, "Well, can we get more information about this?" We know they're reusing them, okay? And so, we said, "Well, how strong are these passwords?" You know, if we weren't capturing on this way, you know, if you use the usual password guessing how strong, how strong would they be? Okay, so, we used John the Ripper and we had to take, you know, some of these passwords and put them in a UNIX form and in order to do that. Okay, but we used the John the Ripper to assess the strength of the passwords and it turned out, when we did that, we had a 173,000 unique passwords getting rid of all the duplicates. In running John the Ripper in the default mode, you know, no dictionary or anything. >> [INDISTINCT]. >> KEMMERER: Can you hear? Did I turn it off? >> [INDISTINCT]. >> KEMMERER: Can you hear me? Hello? Should I sign? Oh, it blinks because it's bad? You'd see, he never tell if it blink, is this good or--no, no. Am I supposed to be done by one? I thought we started seven minutes late, we go off at seven. Okay. So, basically, we used John the Ripper on a unique password file, 173,000 of them and we found out that--and this was--we ran it first just in the default mode. We note no special dictionary or anything like that and we got 56,000 of them in just over an hour in 65 minutes. We then used a large wordlist and we got 14,000 in the next 10 minutes. So, we're talking about, you know, an hour in a quarter and we got 14,000. So, 40% were cracked in less than 75 minutes. We then ran it for 24 hours and got another 30,000 and if you go to the slide, which again, you can't see. This just gives you that information. We got a 40% of them and then the next pumped there and then the 24 hours there. Okay, let's go the next slide. Okay. So, what about--let me say a little bit about criminal retribution, a little bit about law enforcement. These are kind of--maybe they should have been lessons learned instead of what about but, and repatriating the data and ethics. So, if we go to criminal retribution. So, the biggest--big concern on January 25th, the first day, it was midnight when we took over this thing. I wasn't at the lab but I had a grad student reporting, regularly sending out emails. He's, "Now, we have these many accounts, we had this much data. Wah, you know, it's like the sky is falling, okay." So, Sunday afternoon, we were in the lab talking about what to do and my biggest concern at that time was the criminals because these guys were known to be bad guys or they're going to come to get us; we should offer kneecaps as I said. Can you do the next one? More realistically, I mean I did worry about that when in fact I still worry about this a little bit. That's why when you say, "Can we publicly post this?" You know, when I--okay, I've given a talk in this a couple of times already and they probably know who I am. More realistically, we're concerned that they're going to DDoS us because if you remember, we only own the first two domains that they DDoS, our host that are rolled over to Dot-Biz and then you can download their stuff, okay? Next. So, the biggest question I have and I still don't have a definite answer of this is, "Why did it take them 10 days to download a new domain generation algorithm?" Okay, because that's basically what they did. You know, some of our thoughts and one of my students said, "Well, maybe the guy in charge of it was on vacation," and I said, "Okay, that could be possible." I think maybe a more possible one was they want to find out who it was that took it from them. You know, was it another criminal group that, you know, the competition. Do you remember the wars between the Chinese and the - I don't remember--the Russians I think a few years ago and that hackers and so--why did it take 10 days? We know why after 10 days it was taken down and I will show that on the next slide. So, law enforcement. The other thing, you know, I don't know if I was more worried about criminals or law enforcement but the law enforcement I said, you know, we didn't get any permission to do this. You know, we're cowboys from UCSB, once you're called gauchos, right, which is Argentinean cowboy or something. And we didn't know who to notify. More importantly, we didn't just want to notify somebody who said--would say, "Oh, shut it down," because if we shut it down then wd.com won't work, wd.network wont' work and then the russ--the, I almost said their name. The criminals will, you know, own it again, okay? And so we wanted to get somebody in law enforcement who knew what they're doing. Well, I don't know, how many people in this room would know who to contact? One. Are you the security officer here? >> [INDISTINCT]. >> KEMMERER: Okay. >> [INDISTINCT]. >> KEMMERER: Okay. Once you've done that, like now, I could give you all kinds of names. But, you know, of Sunday afternoon, we're there trying to figure out who to contact and I, you know, I was draw on a blank. If you go here, so then I said, "Okay, US-CERT, okay, you know, whether you think US-CERT really does a good job or not, I figured that that might be a reasonable person to contact." So, I went to the US-CERT site and I found out, oh great, it had a pointer to, you know, if you've got a problem with, you know, that you discovered some security, something or rather go here and then they gave me a form to fill out, okay? And we'll get back to you and I go, "Yeah, that's exactly what I want." You know, very disconcerting. Finally, I thought about--it was a guy named David Dagon. I don't know if anyone knows him. He's from Georgia Tech and they have been doing botnet research and I knew that he had dealt with the FBI and the FBI treasury, whoever you want to get a hold of before and so, I asked, "Does anybody have his home phone number?" Unfortunately, someone had his cell number and called him at home Sunday afternoon, I said, "Hey, David, you won't guess what we've been doing, you know," and basically, he personally in touch with an FBI contact and he not only--he not only gave us the information. He sent an introductory letter about--an introductory email about us and so forth and then I felt better because I thought, "Okay, now I've reached out the law enforcement." I've got a paper trail that says, "I'm trying to connect somebody." Well, that was late Sunday afternoon. Monday morning, we still hadn't heard from them and so, I sent a message directly to the FBI guy an email and said, "You know, we got the message from David and we'd really like to hear from you, you know, do we have any guidance about what we should do?" On Wednesday, we still didn't hear back and so, Wednesday, I knew I had a friend at Citrix online in town who I know had dealt with treasury for something like this before and so I called him and said, "Do you have a name of somebody?" And he said, "Sorry, the guy I dealt with has his own consulting firm now," which tends to be the case with a lot of these guys as you can well imagine and so, he checked with the privacy officer who wound up putting me in contact with this DOD Defense Criminal Investigative Services guy who I contacted and he immediately got back to me and which was good because now I felt even better. And then, Friday afternoon, we're talking six days later--Friday afternoon, I get a message from the FBI guy and said, "Oh, I got your email I'll get back to you." I'm going "Are you kidding me?" And I have to say it when I talked to David Dagon, he said, "Don't be surprised if they are not as excited as you are about this because, you know, there are a lot of botnets out there, you know." I expected a little more excitement than that. Well, 15 minutes later, we got another message from that guy and says, "I just read the content of your email, this is great, we've been wanting to do the same thing but we can't get permission, okay?" You know, and, "When can we talk, you know, can we have a conference call?" And it turns out, they were really--they were really good and they did--they knew about Torpig. You know, they had been--they had been studying at it and so forth. They put us in contact with a group, which is a group of different security officers around the world who had been very helpful for us to repatriate the data, you know get it back to the appropriate banks and so forth. So, when we finally got a hold of them, it was good and now, if I have to get a hold of law enforcement, it's really easy but it's difficult because, you know, there's not--there's not an easy URL to go to, okay. Okay, so I said already about the FBI. It turned out to be very good in there. The next one please. Okay, so, as I said before, we had over 8,000 separate accounts and 410 institutions, 1,600 credit cards and to get this, I mean we had to mind the data. We had to figure out what their--what their formats were and so forth and to figure out all of these. And then once you know it, like, you know, if I know I've got--how many PayPal accounts and I said, less than 2,000 but close to 2,000. What will I do? Just pick up the phone and call PayPal. PayPal might be okay but Bank of America, I go when I do it. You have Bank America branch here. You have the phone number, I'll call them, "Hi, Im *** Kemmerer, I've got a whole bunch of compromised accounts for your bank." You know, and, you know, they can buy crazy or whatever. So, it turns out that, you know, that's really not that hard to do and in particular when you then talk to a bank, if you say, "We may have some of your credit cards too, can you give us your bank identification numbers," which are unique for every bank and that's how you can search on these things. You know, which is the first ex-numbers of the credit card and if you say, "Can you send me your beans[PH]?" Well, they don't want to do that. You know, they're very tight with all of that. So, one of the nice things that happened is these FBI guys are actually in Pittsburgh and there's this--this is a place that I was telling about the National Cyber Forensics Training and Alliance, which is a group of a whole bunch of security officers from financial institutions and he--they--the FBI guy put me in contact with them and basically I go to them and they'd say, "Okay, here's who you contact for this bank, here's who you contact for that bank." And it was really nice in some cases. We got the names of individuals who could go and take care of everything for the whole country. You know, like, all of Italy or all of Switzerland and so forth. They have one guy that was repatriating. Okay, let's go to the next slide. Okay, so, if you recall, principle one said, "The sinkholed botnet should be operated so that any harm and/or damage to victims and targets of attacks would be minimized, the collective sensitive data that potentially could threaten the privacy of the victim," and, you know, completely threaten them. One question which I mentioned before, "Should emails be viewed at all?" I mean there's people that feel very strongly. You should not look at that at all. Even in the data mining way which we are using for those things. And then the next one. A lot of my academic colleagues after they saw our paper asked this question--did you have--I'm not familiar with IRP, IRB, Internal Review Board. They are the ones that--and for us in academia, whenever we put in a grant proposal, there is this thing you check and it says, "Are you dealing with any human subjects or anything like that?" We got, "Nah, we're not doing that, no, right?" And then of course on this stuff, we put, "No", yeah I would say because if you say yes then they get out this whole stack of paper that you have to fill out. Well, it turns out, we're not working with human subjects and we didn't plan on getting these kind of data, so we said "No." But there's this other little catch in there, any data that can be used to identify the individual needs IRB approval. Well, I now have IRB approval. You know, unfortunately, I got an after effect on this thing but it's something just, you know, in particular for people that are working on this thing is, you know, is a legitimate one. You have to do it. I won't ever put anything in again there where there's a chance I'll be getting this data without getting IRB approval first, okay? And I still don't know about the emails and stuff like that. I mean we're kind of backing off from that as much as possible. So, in conclusion, we had a, you know, other people have--have gotten data on botnets. Generally, one of them--there's one on Conficker, which is a former student of mine was doing it who was up now at SRI International with Conficker, collected a lot of data where they post as a command and control. But they didn't know the protocol to respond to it. So, these places were connected once and so, if you do these passive sort of things, you get a partial view of it. We were fortunate. We got everybody on Torpig. You know, during that 10-day period connected to us. Oh, I thought of something I forgot, can you go back the slide? Another slide. Okay, it's going to take too long. I wanted to say something about the FBI. One more. Okay, FBI was very good to us. Okay. I'm going to tell you why it was 10 days, okay? On Monday of the second week, we were having a conference call with the FBI guy and he said, "Is there anything we can do for you?" You know, because, you know, we're doing the usual thing, anything you want and we said, like, Giovanni Vigna who was my colleague said, "But we only own the domains for the three weeks out, you know, with the week and a half is almost is gone and the criminals own them for the three weeks after that, if you could, you know, if we had a way of taking down their hosts, you know, or the domain's names then, you know, maybe we could, you know, get in there because we are already dealing as after they signed up for dot-com and dot-net. We signed up for dot-biz for those three weeks, so, it would have rolled over us. So, in any way you can knock them out." And so, "We can probably do that," and we said, "But beyond they're registered and, you know, in these--people that are well known for not responding," and they said, "Everything comes back for the U.S." You know, very confidently he said that. And so, Tuesday, at about 11:30, we got an email from him and he said their domains are gone. Okay and sure enough we checked. And their domains were gone and 45 minutes after that, they downloaded a new Mebroot binary--a new Torpig binary on all of the machines through Mebroot. So, it was kind of the FBI also had a bad side effect for us, you know, on that but in that and so, in trying to answer the question of, "Why did they wait 10 days?" I think that's sort of answers, you know, if they thought, "Hey, this is our competition and we want to know who they are and see what they do and so, you know, we can decide how to do it." They knew when it was taken down. Probably, that was down by law enforcement in the side that we're going to, you know we're down with our learning curve. We're going to take it back, you know, and so that was 10 day. Okay. So, conclusions, we had, you know, we had lots of stuff to look at and we still have--I mean we obviously have in mind all of those data yet. The thing about distinct IPs, you know, they're probably overestimated by an order of magnitude. Botnet victims--our users were poorly maintained machines because they have the vulnerability's available. They choose easily guessable passwords to protect sensitive data and then this last one was just, you know, what I told you about this interacting with registrars, hosting facilities, victim institutions isn't all as easy. And, you know, obviously, I didn't do this alone. Chris Kruegel and Giovanni Vigna are two faculty members who were the security lab advisers and these were the students who've worked on it. Brett was the guy that noticed that it was not being used, you know, three weeks out or whatever and then the rest have participated in their ways. And questions? That's a place I have to live. It's really awful. I hate it. And, you know, I wish we had traffic like I've had coming over [INDISTINCT] past this morning. Okay. Yeah, questions? >> Yeah, [INDISTINCT]. >> KEMMERER: They went to a new encryption algorithm before the Torpig command and control has been broken. So, one of the things that's happened--okay, I shouldn't say what I was going to say. Anyway, there's--as you may believe, there's a group of researchers and some folks working with particularly with financial institutions and so forth that are interested in what's going on here in sharing data and so forth. It turns out that the Torpig, since we did this, they've changed their encryption algorithm several times for Torpig and it's been--it's been broken every time. Yeah, fairly quickly. Okay? Mebroot still has it. Now, one might ask--well, if Mebroot hasn't been broken yet--why, you know, when it's the same people. Well, this again, maybe part of the underground cyber economy. It may be the Mebroot guys are selling their services to the Torpig folks, okay? And Mebroot is doing their, you know, not ready to sell their, you know, encryption that they used. >> [INDISTINCT]. >> KEMMERER: You mean just to make them feel good? >> [INDISTINCT]. >> KEMMERER: It could be. It could be. Yeah. Yeah. Any other questions? >> [INDISTINCT]. >> KEMMERER: It's because those weren't new ones. They were new ones hitting us in a 10-day period. >> [INDISTINCT]. >> KEMMERER: I think that drop was if you look at the data, it was February 4th, which was when we lost it. So, it just went down like that. It wasn't... >> [INDISTINCT]. >> KEMMERER: Huge like in a beginning it's because they weren't truly new. They were new to us, okay? You know, that was just like--so, if you've got people that were infected two weeks ago or a month ago or whatever who are still connecting to command and control or every 20, that's us. So, that was those. And then the new ones were going across but when you saw the drop, like--like even the new financial institutions. I don't know if you guys got the point or the paper on this. Did you send that around or? Any way, I knew (Oris) had it because he told me he was reading it but there's a paper and I can give you the pointer too of it if you want. It's what's going to be and if you go to our site, you know, you see us the sec lab, it will be there. But those--those slides are there but even on the new accounts, which was fairly flat right from the thing. But then you see the drop because it was--that was February 4th, okay, when we lost to botnet, yeah, yeah. And so, it didn't dropped off just like that it. I was interested to do it because as I said we're on--we're actually on the first year of a four-year NSF grant to look at this stuff and so, I'd like--for me, it's an advantage to have--to know the bank people and so forth and so I used that as a quick quote quarter to get to know those folks but it gets old after a while. You know, and I think my grad students are tired like, oh here's a new set of bin numbers. Can you look them up? Here's the banks of interest, you know, what domains you want and you think like somebody like the Bank of Scotland, we said and, you know, they said, "Do you have anything?" And we said, "Oh, yeah we have some because some it's obvious it's UBS or something like that, United Bank in Scotland," and so we shipped that one and then they said, "Oh, here are some more domain names that we have for Scotland. It's piled. These are the things that you're looking for and it's--you know, it doesn't take a lot of time but everything, I mean it do takes time. But it's not--it hasn't all been repatriated but I gave all of it to the FBI and said, "You know, you guys deal with it if you want more so." Yes? >> [INDISTINCT]. >> KEMMERER: Yes. >> [INDISTINCT]. >> KEMMERER: Yes, we did some. But I don't have anything specific to tell you about it. What I remember are there's two days--two days that had a big spike and our conjecture was that there was some new, very popular--one or more very popular website that have been recently compromised. Yeah, yeah. Well, it turns out that some other research--it's not in this paper that we did is we also, you know, when you go to the legitimate site, it downloads the HTTP tag that goes and run some JavaScript that connects you to the real download drive. So, it turns out that that site changes all the time, somewhat like Torpig and we figured it out that algorithm when we actually owned that site for a while. So, we knew which sites were infected. You know, initially, which legitimate sites which is nice information that we've tried to get back to these folks with also. But that was--that's a game that was very hard to play it was like everyday, daily. Anything else? Okay. Thank you.