Tip:
Highlight text to annotate it
X
>> MARION MARSCHALEK: Hello. Welcome to my talk about the thorny piece of malware. My
name is Marion. I'm a malware analyst. I'm sparing my second name in because people seem
to have problems pronouncing it anyway. And I work for the Austrian software company Ikarus
Security Software. My talk is going to be about one specifically
thorny piece of malware I analyzed, and I'm going to start out with some fancy fun facts
about that sample. And the rest of the talk is all going to be about analyzed issues I
had when looking into it. I'm going to bring two analyst techniques I encountered in the
sample and two more analyst headaches that still provide problem for reverse engineers.
First one that is exception handling that cannot ‑‑ execution path. And the second
one is junk code that I encountered in there that was pretty nasty at first pass, but then
you needed to pass by after all. I'm going to talk about binary analyzes of C++ executables
and about multi‑threaded applications for reverse engineering.
All right. Let's start over. Now, all together, this is my favorite piece of malware. Now
why would it be my favorite piece of malware? Well, I reversed it in and out from top to
bottom, and I really had a lot of fun. It is a challenging piece of malware but not
impossible to pass by even for beginners. It's not packed or encrypted but still provides
enough interesting topics to research. But what does it do after all? Well, all together,
I summarize it there. It's an Asian multi‑threaded, non‑polymorphic, file‑infecting spy bot.
What does it do? Like what spy bots do, it can produce screen shots. It can produce screen
captures and send them to C & C server. It can delete files. It can copy files. It can
execute files. Most of all, it can update so it can download a new version of itself
and execute this one. So basically it can do anything the mode of control wants to.
Anyway. What are interesting facts about it? The sample uses structured exception handling
to obfuscate its execution part. That means by throwing deliberate exceptions, the malware
author can pass execution control from one place in executable to another one, namely
the exception handler. And the interesting thing about exception handlers is an exception
handler can find a new entry point that's going to be executed after the encryption.
How does this work? Well, the documentation I could find still was written in 1997 by
a guy called Matt Pietrik. He's one of my big heros now because he did already did recognize documentation
this issue, namely, A Crash Course on the Depths of ‑‑ were committed ‑‑ of
this article. Actually, exception handling is implemented
as a chain of exception handlers which is located on stack and intertwined with the
functions that frames that are around there. And it all starts with the thread formation
book because every thread has its own chain of exception handlers. A reverse engineer
can find this through the FS register adopted 0 which points to exception registration structure
which looks more or less like this. And in some cases, structure contains a pointer to
the handler which could eventually handle the thrown exception and a pointer which points
to the previous registration block which looks like this and eventually, the chain, there
comes a default handler and, well, minus one. All right. Now this is based on the stack.
And in one of the function stack frames. And there's a whole science about building stack
and unwinding the stack, but what's really interesting for a malware author is, of course,
he can register his own exception handlers and deliberately throw exceptions and control
like put ‑‑ point execution flow to some other piece in the code.
Now, this looks more or less like this. If you're inside of a binary and can spot something
like F:0 and see the structure where a new exception handler is linked into the list,
that most likely has to do with exception handling.
Now I told you there's a pointer in there pointing to the handler code which would be
the first switch to some other point in the executable for execution. And inside of this
handler now, someone can change the execution flow to completely different point inside
of the executable. The magic thing about this is an exception is treated as a software interrupt
which means every time the exception occurs, the whole context structure of the file that's
running is saved away and loaded back into the CPU when the exception handler is finished.
And the interesting thing there is that someone can change this content structure and point
the structure pointer somewhere completely different.
So yeah, I know there's a lot of people in there getting excited when they hear they
can point instruction pointers somewhere. Right.
And I know today a lot of things have changed, especially concerning C++. And in Visual C++,
it is based on structured exception handling I showed you before. But the things that have
changed mainly is now every function has its own exception handler and uses a funcinfo
structure which contains information about try blocks and cache blocks and I think I
need to take a break. >> SPEAKER: You would be correct.
(Applause) >> SPEAKER: We have a little tradition here
at DEF CON. Let me tell you all about it. It involves ‑‑
>> AUDIENCE: Drinking. >> SPEAKER: Louder.
>> AUDIENCE: Drinking. >> SPEAKER: Why? Why are we making her drink?
>> AUDIENCE: First timer. >> SPEAKER: We need someone from the audience.
Do we have any first timers here in the audience? No? So there ‑‑ really? Nobody is a
first timer? None? Wait. Okay. Who's everybody pointing at? All right. Get up here. I can't
believe this is the only guy. That's amazing. Cheers! Welcome to DEF CON!
(Applause) >> SPEAKER: Have a good time.
>> MARION MARSCHALEK: Thank you. Sheesh. >> SPEAKER: Where were you?
>> MARION MARSCHALEK: Right there. Okay. Now where was I?
All right. Visual C++ structured exception handling. It's still based ‑‑ sorry.
(Coughing) (Laughing) It's still based on the principal of structured
exception handling I showed you before, but now every function has its own dedicated exception
handler which is compile generated and uses some structure called fun infrastructure that
contains a lot of information, namely information for unwinding fun clips about try blocks and
cache blocks and well, the pointers to the exception handlers that eventually handle
exceptions. Right. There's that built‑in function code SEH
frame handler, which this funcinfo structure is handed over to and then performs ‑‑
well, the matching exception handling to executed exception handlers, of course. And still,
as I mentioned, the important thing there, the exception handler can define your entry
point. Now, I pointed to a really nice diagram that
we see here. Interesting. Right? Open RC is painted by Igar Sakinski, who did a lot of
research on structured exception handling. And I've provided some scratchbook painting
here to show you. It's really not pretty, but let's look through that.
There is a pointer to the exception handler on the stack, right, that's the compile generated
the exception handler. There's the SEH frame differential where the funcinfo structure
ends up at. This pointer points to a try block map. The try block map points to a handler
array. The handler array points to a handler offset which points down to a handler.
You got the thought, right? Well, I provided some screen shots here, from IDA Pro.
Let's get back to the bot. In practice, this would look like this. For example, there's
a registration sequence. I hope people can read this. Maybe. I don't know. But there
is the zero flying by and a new exception handler is registered at the beginning of
the function. And sometime later, there's an exception happening. If you can read that
call, this will almost never work, right? Because this memory address puts to ecx there
is somewhat likely not to be valid. So there's the exception and there is the
registered exception handler which causes the system to execute the co‑paginated handler.
The other funcinfo structure is handed over to the SEH frame handler which then performs
the magic. Let's look into this funcinfo structure. In
this funcinfo structure, there's the values that Igor thankfully pointed out in his diagram.
So there's the try block map and the handler array, and finally, the pointer to the handler
that the user registered. So there's the user generated handler. And in there, you can find
the offset to the entry point. If you have a look at the user generated handler,
it is really obvious that this handler is just registered for obfuscation because there's
nothing else happening there. Then setting of this offset for the entry point. Right.
So much about exceptions. The second point, the junk code in the file.
There was really quite a lot of junk code defined in the sample which is pretty scary
for young analysts if you see a lot of source and a lot of shifting operations and a lot
of loops that actually dump phony, any useful information in there. So I was kind of overwhelmed
on this junk code until I found the principle of the junk code in my sample.
There's a whole lot of research about junk code and binary pass. And principle of this
junk code is pretty simple, opaque predicates. Now the opaque predicates is something that's
just ‑‑ well, branch statement, it always returns true ‑‑ or always returns false.
And so it's always going to be just executed one branch of the branches that there are.
And the other branches gets the junk code. So well, in the sample analysis, it looks
somewhat like this. On the right side, you see the first screen shot. On the left side,
there's a simplified version. And if you think through that, the compare statement in yent
is never going to produce any 0 flags, so the junk not 0 is always going to take the
green branch. Right. You think that is simple? It's true. It was like this all throughout
the sample. Was just as simple. So what did the analyst do? I just put the
normal down and green branch for precedent. If you can see this graphics. I'm not sure.
All right. So the yellow boxes are the productive code and the white boxes are just junk code.
So this was really pretty simple to get by. NS headaches. I spend a lot of time with sample
into the ‑‑ especially because of the threats in there. People have seen the movie
Take Me to Hell, and that's what I've been through with that application. The author
of the sample actually has all my respect because he produced this in C++. This is a
simplified version of the threats that I found in there. There's actually a lot more, but
it boils down basically to one threat that malicious, the whole instance, namely, the ‑‑
well, the bot instances that could start up in the system because eventually there's more
than one file infected that could start up. Second threat, there was the file infector,
always infecting processes that would start up. Of that machinery that would handle the
sending side of the bot which could send messages and data to the C & C and on site, the receiving
side of the bot. And, of course, the C & C command switching.
Now how did I get to that information? That was pretty tricky and spent a lot of trial
and error time in there. But actually what I did was in first steps, I realized that
I have to spot the ‑‑ really interesting threats because there's a lot of timing, synchronization
going on. After doing this, I had to spot the interface communication and the synchronization
meters which actually told me a lot about what threats were about, by triggered by specific
events. I will talk a little bit more about this pretty soon.
In the first step, of course, I had to analyze somewhat the function base of the threats
to really find out what they do, what information they generated and where this information
would eventually flow to. Knowing all this, in the first step it could bring down this
big picture of where is information generated? Which threat? Which thread, sorry ‑‑
accepted information, processes it and eventually takes any action.
All right. So if you go back to that diagram, I found four different ME thoughts of synchronization
in there which were events for triggering the file infector and for managing the different
instances that were started. Threat messages which remain used at the receiving side of
the bot. IA completion part which was used to manage the ‑‑ so you had the receiving
side of the bot, the threat messages for the receiving side of the bot and the critical
sections for data exchange between the threats. When I had that, I could paint the threats
around the synchronization meters. All right. Now here comes the last nastiness
for today. C++. All right. There's actually a lot about reversing C++. There's a whole
science for people who are interested in that I collected a lot of links on that research
on the last slide of this presentation. But I actually want to talk about our visual function
calls. Our visual function calls are very interesting to reverse because they're indirect
calls, and they're only fully determinable at one time. These simple multiple inheritance
features C++, so one of these special function calls can actually call into several different
meters at run time. They're translated using visual function tables
which has a lot in reversing these sorts of binaries. I provided the ‑‑ an example
here. In this example, there's a visual function table actually loaded into the register EX.
And at offset 4 of the virtual function table, there's ME thought that's going to be called
with this call station. That was really sort of the catch me if you can.
Actually, I collected another sample from open RC and Igar Sakinski because he did a
lot of research on this as well. Here's one Class A where there's two virtual
functions defined in there. Underneath this class definition, you can see the myriad of
Class A where there's a virtual function pointer actually pointing to the virtual function
table of Class A. Now virtual function table is something that just class have to have
virtual functions defined in there. All right. Here's the second Class B which
also has really similar layer with two virtual functions defined in there. And another interesting
thing is the Class C because Class C inherits Class A and Class B and implements one virtual
function each. Now, I already have Class C somewhat bigger because as it inherits other
classes, the testing includes their class layout and also the virtual functions pointers
in there to the virtual function tables. These virtual function tables are now adapted to
fit the needs of Class C and point to the actual function offsets that Class C implemented.
All right. This is really dry to look at, at code. Back to business. Here's the C & C
command switching function which is a really good example for virtual function calls. Under
you see little yellow boxes. This is all memory allocation for objects that are going to be
instantiated in the green boxes. And then you see one pink box which is the virtual
function call which was actually used to call to the bot functions. The bot functions are
implemented as direct classes for one bot action super class. And all had one function
overloaded, sorry, implemented that was the bot action.
Now here, another IDA Pro example with the move file object. Here in yellow you see the
object instantiation. I'm sorry. The memory allocation where there's space reserved for
the object that is going to be instantiated in the green box. And what you see there is
a call to a constructer. Now this constructer actually has call into the super class constructer,
as it work with direct objects. And there you see the first VFTable. I will
talk about this in a second. As I mentioned, this constructer has call into the base class
constructer, and there's another virtual function table where there is space reserved for two
virtual functions. Now IDA Pro checked the cross‑reference of this base class constructer.
There have 23 cross‑references, I guess, surprise, there's like 23 bot actions that
can be taken by the bot. All right. Along this ‑‑ the final step
is the call into the function ME bot of the new file object. What you see there is that
the function table of the new file object is loaded into the register and the function
offset is called ‑‑ if you have a look at the virtual function table of the new file
object, the ‑‑ for there is move file function. So theory approved.
Using these virtual function tables, you can not easily but pretty fast determine which
functions are going to be called at these virtual function calls. All right. This was
my presentation. Here are the promise to the links. The samples
we found online and they're the first link. And while if there's any questions, you can
contact me on Twitter or I'm going to be out in the hallway to answer your questions or
receive critics or anything you want to tell me now. Thank you. (Applause)