Hunting Attackers with Network Audit Trails

My name is Tom Cross, and the Director of Security Research here at Lancope. What I'm going to talk about today is the utility that audit trails of network activity can have in trying to attack activity that's happened within a network and clean that activity up. I want to start by talking about the subject of forensics as it applies to computer security. I recently went down to SANS Digital Forensics and Infinite Response Summit in Austin, Texas, with my colleague, Charles Herring, and some of the material you're going to see in this presentation comes from a talk that we gave at that summit. One of the things that I encounter often when I talk to computer security people is that they ... some people have a very narrow definition of what forensics means in the context of computer security, and I think that it's actually something that is a consequence of the nature of the kinds of attack activity that people have been experiencing over the past decade. I think that the kinds of attack activity that people are experiencing today has changed, and our understanding of forensics and incident response needs to change with it. Many people see forensics narrowly as being analysis of hard drive contents, particularly in a context where there's a desire to collect evidence for a criminal prosecution, and the idea is to have a chain of evidence with respect to the data that's on the disk and to be able to search the disk for data that may have been deleted and reconstruct that data, and search that data for evidence of a crime. That practice of forensics is obviously important, but it is applied usually in cases where the owner of the computer is the person you suspected of committing the crime. Usually when we talk about computer security, we think of attacks that are launched against computer networks by outside attackers or external adversaries. The problem is that often it's not possible to prosecute those people. It's very difficult to get access to them or to identify who they are. When people are dealing with computer security as a problem in general, then tend not to focus on evidence collection. They tend not to focus on prosecution because it's not something they can easily avail themselves of under the circumstances. Instead they tend to focus on protecting their network from attacks, and they tend to focus on cleaning up attacks that have happened. >> When a computer network is breached, people have a sort of binary view of that. In the past, people have been very focused on protecting the perimeter from breaches, and when breaches occur, it's game over. You failed to stop the attack from happening, computers have been compromised, and the only thing to do at this point is to clean them up. This point of view comes from dealing with attacks that are broadly targeted and financially motivated, where the attacker is not really interested in your organization specifically; they're interested in breaking into as many organizations as they can. These pieces of malware, once they got on your machine, they were just there to collect credit card numbers or other financial liquidable data; and so they weren't necessarily there to search around within your network. There was not a lot of analysis that needed to take place when an incident like this was discovered. You just needed to clean the malware off the computer and get back up and running again, and I think that that is where things are really starting to change. I think that we're seeing more sophisticated, targeted attacks that are hitting a variety of different kinds of organization, and those organizations are also simultaneously more aware of that kind of attack activity happening than they probably were say five years ago. >> When you have a sophisticated, targeted attacker who has compromised your network and taken over computers in your network, you may need to ask some questions about what was happening there beyond simply cleaning the malware up and getting the infected computers back online. You need to understand how that attacker was able to infect your environment. You need to understand what different assets you have that were compromised and make sure that you have a comprehensive understanding of that attacker's behavior before you can feel confident that you've really removed them from the network. That process of analyzing the attacker's activity in your environment and trying to create a complete picture of everything that they're attempting to do, that process is also something that I would call forensics, or incident response. As we have dealt with more and more sophisticated attackers, that process has become more and more important. It's important, first of all, because these attackers often have multiple infection points in our network. They're using multiple different kinds of malware with different command-and-control protocols to control the environment. If you find one and you clean it up, you can rest assured that there are others that you haven't found. Without doing that in-depth analysis, it's really difficult to piece together a comprehensive picture of the compromise and to feel confident that you've rooted it out of your environment. The second things is that what we've learned through handing these attacks over time is that, when you analyze them and you understand how they occurred, there are pieces of information that fall out of that analysis that might help you detect future attacks by the same adversary. We talk about advanced persistent threat. Persistent attackers are not going to be deterred because you found some of their malware and cleaned it up. They're going to continue to target you. The ability to learn from their techniques and to apply what you have learned to look for continued attacks by them is a critical part of how you actually protect your network again future attacks. >As a part of that analysis, I think it's important to consider the timeline of an intrusion. You may have a user in your environment who goes out to the internet and accesses something malicious, and their computer gets infected. That's hopefully something that you would discover. You might have multiple means of doing that. You might have an IDS system that fired. You could have a gateway advanced malware analysis system that's taking documents that come in and analyzing them, and tell you if they appear to be malicious. You may discover that this has happened, but likely you'll discover that it has happened after it's over and that computer has become infected. Once you discover that the infection exists, the question is: how much time does it take you to actually reach that computer and deactivate it and remove it from your network. Often people chuckle when I put this timeline in this chart up in front of them, because I'm showing the incident response team disabling the infected machine in seven minutes. Most people don't have a responsiveness that is anywhere near that. It can take days to get access to an infected machine and disconnect, and to work with the business owners that are associated with whatever that computer is doing in the environment and get them to understand that you've got to go in and shut it down, and that that business process is going to be halted while that analysis is taking place. This is a ... a seven-minute response time is a really efficient, well-oiled machine with respect to responding to breaches. The question that matters is that what happened in your environment ... between the time that that malware infected that host and when you were able to disactivate it, the fact is that, if the attacker had minutes or hours or days to operate on that machine during that window of time, then they may have pivoted from that initial infection point to other points in your environment. Really coming up with a full understanding of the incident involves analyzing those factors and figuring out exactly what happened during that window of time. In order to do that, you need access to data, and there's a wide variety of data sources that could potentially be valuable to you. This is really where I'm talking about the concept of incident response and forensics expanding. It's not just about analyzing a hard drive anymore and figuring out what's on the hard drive, because the reality is that your attacker is not leaving a record on that hard drive for you of everything that they did in your environment. You've got to look at other data sources in order to get that complete picture. I think there are three data sources that are of critical importance. One of them is logs. End points have log information, network security devices have log information, servers have log information; and of course, a well environment has all of that log information going to a central database where it is stored and where it is searchable. That's an incredibly valuable resource. However, once a computer is computer is compromised you can't trust the logs coming of it anymore. The first thing an attacker is going to do when they control a computer is to get control of the logging process so that they're activities are no longer logged. Furthermore, logs have a tendency to focus on events that devices consider to be interesting. For example, an IPS system is only going to log attacks that it detected. If your attacker is hitting with zero-day vulnerabilities that IPS system doesn't know how to detect, obviously, those things are not going to be logged, so a log can miss some critical pieces of information that are part of the complete picture of what happened in your network. Obviously an ideal thing to get a complete picture of what happened on your network would be to have packet capture happening everywhere within your environment, and then to be able to store those packet captures forever; but the fact is that that's not realistic. Packet captures are very powerful if you have access to them, but they're also very expensive to store. The fact is that you're likely to only store a few days or maybe a couple weeks of packet capture if you're really heavily invested in it. The other thing is that you're not likely to be doing it pervasively throughout your network. You're likely to be doing it at an access point that exists between your network and the outside world, and maybe you've got a little bit of packet capture spattered around within your internal environment, but it's very difficult to capture every single packet that happens everywhere. This is where NetFlow comes in. I think NetFlow is a very powerful tool for collecting an audit trail of what's happened in your environment, and it's a good complement to packet capture in certain ways because it can see things that packet captures aren't going to see. The first thing is that NetFlow is compressed. It's just a header information regarding the transactions that happened, and so it's easy to store much more NetFlow for a much longer period of time for packet capture for the same investment in disk space. It really boils down to how much time you want to store, how much history you want to store. With NetFlow you can potentially store months, whereas, with the same amount of disk space, you might have only gotten days of packet capture. The other thing is that it's really easy to get NetFlow pervasively from your environment. You can get NetFlow from down in the access and distribution layers of your switching fabric, and so that enables you to see ... if you look at this network map, if the computer at the very bottom of his network map, the bottom right corner of the chart, gets infected, a system up at the firewall level might be able to identify transactions that came from that computer and went out to the internet; but they're aren't going to record transactions that happened between those end point nodes down there at the access level, and those transactions may be critically important when you're putting together what happened during a security incident to understand how the attacker pivoted form his initial point of infection to other machines within your environment. For those two reasons, I think NetFlow is a critical ingredient in the recipe of how you defend your networks against attacks. It has a lot of unique value alongside syslog and packet capture. > Let me talk about certain things you can do with NetFlow. Obviously, once you're collecting NetFlow, you can do real-time detection. That's not really what this talk is about, this talk is about forensic audit trails, but understanding what you can do in real time helps you appreciate what you can do with the history. Obviously, if you're getting all the network transactions and you're doing real-time monitoring, you can get a picture for what's happening, and you can attempt to detect activity that is suspicious. For example, you can detect data exfiltration. If large amounts of bytes are moving out your network onto the internet, that's something that's going to be visible ... or moving around within your network ... that's something that's going to be visible via NetFlow because you get an idea of what transactions are taking place and how many bytes are moving. NetFlow can detect specific activities that are suspicious such as reconnaissance, and again, it's really important to be able to detect reconnaissance within your network because, once an attacker has compromised one machine, he's going to scan around inside that network to find the data he's looking for and other points that the can compromise, and that reconnaissance activity is something that is valuable to detect in real time so this is useful. Another thing you can do is you can look for botnets. If you have thread intelligence, if you know the IP address of a command-and-control server or of a drive-by download site, you can monitor for that in your environment in real time by looking at the network transactions that are happening and seeing if they're ... have the same IP address or url. You can do real-time threat intelligence monitoring with real-time NetFlow monitoring. This, I think, is a good gateway into the question of what you can do with the months of history that you stored, and I think that the first key is to understand that you usually don't get thread intelligence when it's fresh. When someone tells you about an IP address of a malicious attacker, by the time that information got to you that attacker has already been operating for some period of time. If you take that IP address and you put it in your system, and you start monitoring for attacks that it targeted, the fact is that you may have already been targeted before that happened; and so it's really valuable to be able to take thread intelligence data and do a historical analysis of that data in order to see if you have been targeted by that adversary in the past. You may remember about six months ago there were a number of organizations, primarily Mandiant, who released information a thread actor called APT1, which was an actor that engaged in sophisticated, targeted attacks across a large number of organizations for many years. Mandiant released a number of domain names and ND5 hashes of malware, SSL certificate IDs, and other things related to this attacker. Other organizations released some IP addresses, and in fact, Lancope released some unique IP addresses and other indicator associated with this adversary based on some of our analysis of this adversary and their activity. What do you do with all this thread intelligence? The fact is that the minute all this stuff came out on the internet, this actor stopped using all these ... all of the systems associated with these indicators. All of this thread intelligence was burned. The IP addresses that were being used for command and control were deactivated. The domain names were abandoned. The malware was abandoned. What value is all this abandoned thread intelligence? If you've been storing history of your network transactions for several years, it's possible for you to go back and check that data to see if you ever, in the past, interacted with those hosts. Even though the attacker may have abandoned that particular host at this time, there are probably other ways that they are engaged in command and control in your environment; so if you find that you were communicating with this host in the past, that can be a starting point for an investigation that can lead to discover what's happening in your network today. In fact, we had several customers who, based on the information that was disclosed about APT1, were able to discover activity that happened in their network in the past by looking at their NetFlow collections, that they were previously unaware of. This turned out to be a valuable tool in some cases. One of the things that you can do with this data in StealthWatch is you can build these charts, like the chart you see here, which shows you a long period of time ... this is a month's worth of data ... and it's graphing when your network saw activity to this suspicious IP address. You can quickly see patterns of interaction between your network and this suspicious IP over a long period of time, and then it's possible to drill into each of those interactions to see the specific activity that's happening. Once you have discovered that you did interact with one of the systems in the past, the next stage is to engage in a deeper investigation of what kind of activity occurred as a consequence of that infection. Of course, if you've got all the network history there, particularly from your access layer, you've got a great resource to do that kind of analysis. This is a scenario that we built here in our lab related to following an indicator of compromise. In this scenario, we've received a couple of IP addresses for a website that was engaged in a watering-hole campaign, so that means that the website was taken over by the attackers, and the attackers had placed an exploit there. The website was selected because it's a site that people in your organization visit. We take a look at the IP addresses for that website in our StealthWatch NetFlow records, and we see that we did have a computer access that site, which is not necessarily surprising because, in a watering-hole attack scenario, obviously people in our environment are likely to be accessing that site. We dig in and look at the details of the different network transactions that that host engaged in around the time that they accessed that website, and when we look at these details we discover that there are a few suspicious http connections that occurred right after contact with the infected site. This tell us that this might be a drive-by download attack, because typically in a drive-by download scenario, the site that is the initial point of infection redirects the user's browser to other systems where the actual exploit payload is delivered. You can often pick that right out of your NetFlow logs. Then, of course, later we see that reverse FSH shell, so an FSH connection has come out of our network to someplace on the internet, but most of the bytes were sent by the client. If you think about FSH, you type LLS or DIR and press enter, and you're going to get a bunch of data. The user in FSH sends less data than the server that they're accessing, so if you see an FSH connection coming out of environment but you are sending more data than you're receiving, then that looks a lot like a command-and-control channel where somebody on the internet is remotely controlling a computer in your network, and it jumps right out at you when you look at NetFlow records. Clearly we did have a host that was infected by this watering-hole. The next thing we need to do is look at other behavior that that host has engaged in, and in this case the host appears to be scanning the internal network for computers that are running SMB or MSRPC, so Microsoft systems, and it may be the case that this host knows a vulnerability in MSRPC and so they're trying to pivot from this initial infection point they obtained to control other computers in our network. By analyzing these records, we may see that ... we see these little transactions here with one or two packets being sent, and that means that's just scanning activity; but if we see a connection get packed up between this host and one of the victims that it's scanning, we now know that that most may have successfully pivoted to that second location. So now we have another computer in our environment that we need to investigate. In addition, what we can do is we can take the command-and-control IP that was running this reverse FSH shell, and we can search for it to see if we have other computers in our environment that are reaching out to the same command-and-control system; and in this case we are seeing that activity happen. This is where thread intelligence is really valuable. You can see that we started with one piece of information about a malicious site, and based on that piece of information we were able to analyze the kill chain a little bit and see the whole process of the attacker's attack activity, and we discovered new IP addresses that the attacker was using; and then based on this IP addresses, we were able to discover additional infections in our environment. Another scenario where this can come into play is where you see zero-day attack activity and then you see IPS signatures, for example, come out that will detect attacks that targeted a vulnerability that was being exploited in the wild before it was publicly disclosed. When you see that happen, you IPS may detect attacks that involve that vulnerability; but the question is: were you targeted by those attackers in the past before that IPS signature became available? When you get those IP addresses off of those IPS signature [fires 00:25:16], you can go back and check your NetFlow and see if you previously communicated with those IPs before that IPS signature was there, and that's a great way to identify successful attacks in your environment that happened in the past, before you had access to the threat intelligence in question. Bear with me for one second. Here's another scenario that is interesting. This is a sequel injection scenario. What you see here is a console that's been set up in StealthWatch to monitor a number of web servers and activity happening with those web servers. In this case, you see a great deal of data leaving this web server and going to the internet, and that spike on these charts significantly exceeds the day-to-day traffic that this web server is engaged in out to the internet. That's strange, so we'll investigate it a little bit more deeply. We take a look at the actual transaction in question, and we can see a very large amount of data being downloaded from the server, so we dig in again and we take a look at the site, and we see that, from a different source address, there has been a bunch of reconnaissance activity that has been taking place against that host. Again, we're beginning to work back our kill chain where, in the beginning, we saw this event that looked like a lot of data being downloaded off the website, and we're able to see now that there's also a bunch of reconnaissance activity that took place and this is indicated of a sequel injection attack where somebody has exploited a vulnerability in our website to access raw records in our database and dumped the entire the database out of the output. We've got a few IP addresses now of different ... associated with this actor that we could do some more investigation on to see if they engaged in other activity on our network. These discussions have centered around external [predactors 00:27:25]. I think it's also important to consider insider threat as a subject. I think that insider threat is something that a lot of organizations don't have a very good practice around because people have not necessarily understood how to build an effective insider threat practice in the past. I'm constantly talking this book ... Carnegie-Mellon obviously runs this group called CERT, which I'm sure most of you are familiar with, and they've been doing research on insider threat for many, many years. There are incredible resources up on their website. If you Google for CERT inside threat you'll find their resources up there. They published a book last year called The CERT Guide to Insider Threats, which is the best guide on insider threats that I've come across. They have some very good recommendations in there about how to build an effective program that's based on evidence that they've collected over ten years of studying really insider threat cases. One of the key pieces of information that CERT tells you is that the insider threat is not strictly an IT problem. Whereas many computer security are considered an IT issue exclusively, insider threat is a management issue, it's an HR issue, and it has to do with the relationship that the company has with employees and employees that have become disgruntled. Often the way that an incident is detected in the context of insider threat, it's not because some computer system identified that something was going wrong. It was because the people that work with the person who had become disgruntled identified that that person was making threats against the organization, and it did seem like that person was likely to do something wrong; and then that information was then taken to IT, and IT was able to analyze monitoring systems and logs that were available to find evidence of whether or not the suspicions were true, essentially. Typically, in insider threat cases, if they are successfully identified and prosecuted, it's a consequence of good log collection. In this scenario, consider that HR may have come to you and said, "We're concerned about this particular individual. He threatened to do something destructive to the organization. We'd like IT to take a look at that," or "This person just quit on bad terms, and we're concerned they may have actually traded some data." Within StealthWatch, if you're tied in user identity information, it's very easy to put someone's username in and get information about what network transactions they've engaged in, even if they moved around to different IP addresses such as when they were in the office or logged in over the VPN, or moving around to different wireless access points, you can get a complete picture of their behavior. Another thing, in addition to simply getting the raw network transactions they engaged in, you can also see a picture of different security events that fired against that host. In this case, we have the suspect data loss security event, which detects data being exfiltrated out of network, and we noticed that that event fired while Lucy was using the IP address in question; and so maybe Lucy was exfiltrating data. In this case, we can see that this transaction has occurred, so we've got some information that Lucy might have moved a lot of data out of the network, and if that's what we suspect Lucy of having done, then this becomes some evidence that we can use to establish that that actually took place. Let's look at a little bit more a more complicated example. This is ... in the course of looking at Lucy, we also noted that Baron had a suspect data loss event that fired on his IP address, and so we're concerned that he might have exfiltrated data as well. We take a look at Baron's exfiltration, and we look at ... in this case we got some application details, so it looks like a MySQL database dump, which is interesting, and it's significantly large and it's been sent out to a host in the Ukraine, which is troubling. We note that this exfiltration occurred at 9:07 a.m. We continue to investigate this, and first of all we're wondering: where did Baron get this MySQL dump. Baron is not a person who usually works with database systems, and we look at the different network transactions associated with this IP address and we notice that there was a significant download from a particular database server on our network that occurred to this host. Clearly Baron dumped some data out of one of our critical databases; then he exfiltrated it to a host in Ukraine. Now we're very upset with Baron, and we continue to dig in. We notice that, prior to ... in this case we're still looking, so this is the database transaction, and we look at the timeframe associated with the database transaction and we can see that this timeframe occurred between 8:55 a.m. and 9:07 a.m.; 9:07 a.m. is when the exfiltration to the Ukraine began, so again, this is just more context that this database dump occurred right before the exfiltration of it happened. We dig into Baron a little deeper, and we find out that, in fact, there has been another transaction between Baron's client and this host in the Ukraine that's been going on since 4:00 that morning, and this looks an awful lot like a command-and-control channel, so it may be the case that Baron is not actually responsible for stealing this data and that Baron's computer has been infected by an external actor, and as a consequence, that external actor used Baron's system to steal some data from our database and send it to the internet. It's important that we know and that we begin to investigate deeper and try to understand more about this attacker and why they're compromising our network. I think that it's really important to keep these 5 W's in mind as you perform these investigations of your computer network. You're trying to establish who did this activity. You want to know what they did, and when I say you want to know what they did, you want to have a complete picture of everything they did. You want to know what systems and how they targeted those systems, and what data they accessed. Every bit of information that you put together about their overall behavior can lead you to being able to, first of all, ensure that you've completely rooted them out of your network, and secondly, understand how you might be able to detect attack activity from them in the future. You want to know where they want. You want to know what in your network they accessed. You want to know when. You want to build a timeline for the incident so that you can, again, be sure that you have a complete understanding of what happened; and ultimately, you want to understand what their objective is. That's very important in terms of being able to improve your posture going forward. If you know what you're adversary is after, you can take steps to protect that asset more effectively and to monitor that asset more closely. Hopefully, I've established that forensics in the context of computer security is not nearly about preparing evidence for trial. I mean, we may never be able to prosecute these APT actors that are hitting our network from remote sites. It's become more about understanding the attacks that we're subject to and using that understanding to better protect our networks. It's not just about analyzing hard drives. It's about getting a complete picture of an incident that has affected us from a NetFlow standpoint as well as from a packet capture and syslog standpoint as well. I hope that it's opened your eyes a little bit about what's possible in terms of network audit trails.