Def Con 21 - Daniel chechik and anat fox davidi - Utilizing popular websites for malicious purposes

>> DANIEL CHECHIK: Hello. Hi. Hi, everyone. I'm Daniel Chechik and this is my colleague Anat Davidi, we are were with Trustwave Spiderlabs. We are going to show you security on RDI. Pay attention. >> ANAT DAVIDI: You better actually pay attention. I'm going to kind of help you out in case you can't see very well, but I won't tell you the cool parts. So this is a browser. (Applause). >> ANAT DAVIDI: Yeah, there you go. And this is our website, with a cool cat picture. Fiddler, which is an HTTP proxy, we will record all the traffic. And this is Google Translate. And that's Windows calculator! (Applause). And math. And that's it, really. No nah, I'm kidding. (Laughter). >> DANIEL CHECHIK: Okay, that's cool, because Google just exploited our machine. But before we go and talk about what are the RDI, went have one quick slide about why RDIs. Okay, security web scanners. That definitely looks like a really boring slide, but we are in DEF CON. So let's cut the crap and face the truth. There are 38 security URL scan engines on this list and they are all based on the same technology, most of them, actually. It doesn't matter how you call it, your reputation, it's just black list with fancy names. Of course, some of them are made with some other stuff, like, static analysis or even dynamic analysis, but the core of the majority is black list. It may sound surprising that in 2013 agent technology is still heavily used, but the fact is that it works pretty good. The way that your reputation is done is pretty simple. It's kind of a scoring system. So new domains or APs are more suspicious on popular websites. I mean, let's take Google.com, for instance. It's the website that everyone trusts, right? Right? >> ANAT DAVIDI: Okay. Sure. >> DANIEL CHECHIK: No one is ever going to black list Google. Okay, so we use the virus total to go to another popular website, yahoo.com. And you see it pretty much agrees, it's a clean website. One finds it suspicious, but I don't know why. What is RDI. You are probably wondering what the hell are we talking about and why are they talking about your reputation? Well, don't worry, and don't even bother to looking it up either, because we came up with this term, and even Wikipedia doesn't know what it means, but in a few minutes, you will know. Okay. So let's assume you have a user, we have some kind of a website to provide a service and we have a website. In the user accesses our website, you will see a legitimate website. If service accesses the website, it will receive the same legitimate website. Basically so far, who goes to our website is safe and shouldn't suspect anything. But if the user accesses our website, using the service, which download the content from our website, and does whatever it does with it, the code is executed and turn the page into a malicious page. >> ANAT DAVIDI: So that's sort of the bird's eye view on RDI. And now we are going to kind of take a look at what we need in order to execute such an attack. So the first thing we are going to need is one simple web page, and really any web page will do. We chose to use award press blog. And what we need is a trustworthy web utility and what I mean by trustworthy web utility. We need some sort of website that people are pretty familiar with, that's sort of trusted and it needs to have some sort of service that will take content that's provided by the user and manipulated somehow and finally in some way also return it to users. For example, we're going to use the yahoo cache service, and it doesn't really matter that it's Yahoo!. It could be Google, Bing. Anything will do really. And the last thing we need, we need a script, JavaScript code that will behave differently within certain contexts and really this will be the core of the concept of RDI. This will be a script that sort of represents the behavior that Daniel described earlier. It will behave differently when it's under the Yahoo! cache service and we will dig in the details of that. And of course, you need funny cat pictures and this is by the way what Daniel looked like when I said, let's put lots of funny cat pictures in there. >> DANIEL CHECHIK: No, it isn't. >> ANAT DAVIDI: Yes, it is. All right, so we are going to do a demo of Yahoo! cache, the attack but this time we will go into a bit more detail, not just show you a video. We will do it hopefully live. This is Vegas. We are gambling and hoping. This is my first time gambling in Vegas. I'm doing it live and let's hope I win. I will go to a website that we prepared and cached on Yahoo!. This is a regular Word Press blog. We can take a quick peek at the source. You will be able to see that there's nothing particularly suspicious about it. I know that there are important stuff you will have to trust us. We will not go over everything, but it's a completely clean Word Press website and the next thing we will do is we will go to the cached version of this website which we have prepared in advance. And see what happens. Now, we'll get to the end sort of RDI and the advantages of different services but the fun part about using a caching service is ‑‑ (Applause) >> ANAT DAVIDI: Hi. >> What's this called? >> AUDIENCE MEMBER: Shot the n00b. >> Shot the n00b. Why do we do it? First time speakers. We got one back there. There you go, first‑time attendee. You guys are like ‑‑ I don't even have to say it anymore. >> You guys know each other? >> ANAT DAVIDI: Yeah. Yeah. No, we don't know each other. >> Okay, cheers! >> Cheers! (Applause). >> ANAT DAVIDI: So as I was saying ‑‑ (Chuckles). The cool thing about using Yahoo!, we can remove it completely once the page is cached and we will have an attack that's completely hosted by Yahoo! and only by Yahoo! which is what we did here. And, you know, it's pretty nice. There we go. So we will take a look at the script and we will try to understand what happened and what we did. The first part is ‑‑ the first thing that the script really does is access a span tag that exists within the page, because we are running under Yahoo!'s cache, the Spam tag. Yahoo! is not responsible for the content of this page. We thought that would be a fun string to use and we decided to use this as a sort of key, we will take this string and we will generate some sort of pseudo unique key, just something unique for us. We encrypted most of our malicious code with this key. And ‑‑ yeah. The next thing we will do is ‑‑ this is a side track thing, that's used constantly in the wild, what we will do here is, we put in the page, a huge div tag, it includes waka number, waka number, waka. It's basically our malicious code. It's slightly obfuscated. It's tried to prevent the silly defenses that go, if you have a really strong string that has percent U in it many times, so just block it. So we replaced it with something, and we will fix it back here. And we will evaluate it. It means that right now we have some sort of string that has been de‑obfuscated and encrypted and it will try to turn it into JavaScript. So only if we are really in the contempt of Yahoo!'s caching service, that we knew exactly where it would be, only then will these two methods that we will try to execute actually exists because as you can see, they are not defined anywhere else within our code. We just pretty much showed you everything. And that's ‑‑ that's sort of how the script pulls off this cool trick. Do you want to take? >> DANIEL CHECHIK: Well, that's cool because we just managed to host our attack on Yahoo!. But we want to take things one step further. We want to make it more evasive. So this time, we'll use Google as an example, because we don't really want to pick on anyone in particular here. So here's Google Translate service, because I use it quite a lot. I don't know English quite a lot. >> ANAT DAVIDI: That's if you haven't noticed. Never mind. >> DANIEL CHECHIK: So this time we will execute our attack not only in the context we are in but we will actually construction our attack using Google translate service. So I hope you still remember the demo we presented in the beginning. We are going to get back to it now and take it step by step. So this is Google Translate service and we just typed in the link to our website. And asked Google to translate it from Hebrew to English. Of course, we can't really expect the user to browse through Google translate and type in the link to our website and get exploited because probably won't do it. >> ANAT DAVIDI: Hopefully. >> DANIEL CHECHIK: So as you probably notice, Google allows ‑‑ gave us the ability to generate a direct link to a translated page, including the languages in everything. So it shouldn't be a problem. And, of course, with this URL, we can spread it via email, social networks or any other media. And, again, it's Google. Who will black list Google? So let's take a look on the flow of the attack we will be using Google translate service. Okay. And the user trying to access our website. And using the Google translate service. So we access the content from our website and it simply sends back the content of the page. The Google translate drop the content and add some translation script, and finally, send it back to the user. The user browser translate the content of the page, and by doing so ‑‑ >> ANAT DAVIDI: Oh, we have this one. >> DANIEL CHECHIK: And by doing so, it creates the encryption key for us. Afterwards the JavaScript is executed and uses the key to decrypt the JavaScript, which turns the page into a malicious page. >> ANAT DAVIDI: Okay. So once more, we come and take a look at exactly how it happens. You will see that the concept, as you could see right now, the concept is very similar. What we wanted to do by giving two different services and two different examples is just show you where it defers between services and where it kind of looks the same. I'm not going to go over the things that look the same. I will just show you what we did with this one. So this time, we added a couple of div tags into the page with text in Hebrew. The first one contains the following text, as you can obviously see, it says "script" or is going to be translated to script by Google's translation. And we needed a key as well. So we used the obvious choice which is Bob Marley, and it translates to Bob Marley by the way, and we will use this to generate a key. You can see the waka div and it will look similar but yet kind of different. I hope you can read this. The first thing that is going to happen here, we will try to create an actual element within the DOM and this element is going to take content from within the translated page. We are going to try to dynamically fetch the awards script that was just translated by Google and generate the actual element script within the DOM of the page. Of course, this means if the Google translation scripts didn't work or anything went wrong this won create a script element and absolutely nothing malicious will happen. Now, the next thing we will do is we will take the key, Bob Marley and we will do ‑‑ actually, we will use the exact same code to generate the code and do the obfuscation. We're lazy and it works. We are decrypting our code, using Bob Marley, the English string, Bob Marley, which, again, only exists if Google's translation worked on our page, and as opposed to the caching service, which just sort of made sure that we are kind of under the Yahoo! cache context. The translation here, we actually waited for Google to do the translation work. They actually constructed the attack for us when they did the translation of the page on the client side. And, of course, finally, we're going to try to execute a method, hello world which does not anywhere except in our obfuscated encrypted code. I guess what we are trying to say here. That RDI is really a technique, it's sort of a method. It's taking all of these services that take content from the user and somehow reflected back in order to construct our attack or rather in order to obfuscate or evade with our attack. As you probably noticed throughout here, the point of RDI is context. That's the most important thing here, because obviously when you think about it, you might be asking yourselves, why don't I just take my malicious link and put it in Google translate? And basically, the problem is that if you go to Google translate and you try to put in a malicious link, it's obviously going to be flagged as a malicious page. It's not like you will be able to go to it. It will give up the big red message, this website is suspicious. Don't come near it. However, they don't scan the website they are about to translate within their own context. They don't scan it as if they are a user using this service and so they will not see what we are trying to do here at all. Of course, everything I just said leads us to the fact that it's very hard to detect, because if you think about all the elements here, everything we did, we had a big div tag, we could have split it up and done lots of things with it, when you consider what we did with the key, for translation, we could use this for languages and put it different places and use any string for the caching service, we could have picked any element that's added through Yahoo! cache which means it's extremely difficult. You can't sign it statically. You have to be in line and looking at the content as it's going to the user in order to understand what's happening and see what's going to happen. And our conclusion from this is that RDI is pretty awesome for evasion, but our word in all of that, we want to test it against some of the Internet and see how awesome it really is. It has lots of security engines tested against it. And we wanted to test it against wepawet. We wanted to see if the code is actually executed, what they are going to see. And we first took the translation page, but when accessed directly, our translation attack page, when accessed without the translation link. And, of course, as you can see, virus total said that zero out of ‑‑ how many is it? 39 detections. Nobody said it was malicious. And well, they would be correct. Our code is not at all malicious, when accessed directly and when we put it on wepawet, we said this page is benign and no evals were executed and, again, when accessed this way, they would be correct, because, again, nothing happened. But then we said, okay, now, let's take the actual attack URL, the one with all the translation features added and see what happens then. And well, we put wepawet, the dynamic analysis and it broke, apparently. We are going to let them know, but they are actually pretty cool people. They fixed it before we could even tell them. I think it flags it as suspicious, I believe. And more interestingly, is when we put it back on virus total, we got one detection, one single engine that dynamically scanned our content and that would be the security site check. If there's any security people in the crowd, good job. They dynamically analyzed the content, exactly as we intended for the attack to be executed and figured out we were attacking with this page. In the context of RDI, this doesn't really concern us. As Daniel said at the boring slide at the beginning, RDI is not meant to deal with these dynamic sort of engines. It's meant to weed out the 38 other engines that use the black listing techniques. It's meant to push us forward from really using the old technologies and force us to understand what's happening when we are analyzing a website and trying to understand if it's malicious or not. So, this is RDI. Thank you for listening. I don't think we'll have questions and answers right here, but we'll be in the Q&A room later if you want to chat, request questions and all that. Thank you. (Applause)