12.3 - Code generation part 2 - [compilers theory] by alex aiken

This video is a continuation of the previous video where we'll be finishing up co-generation for the simple language dealing with function calls, function definitions and variable references. So just to remind you what we're working on, here is the simple language And again, we have a bunch of different kinds of expressions And we dealt with all of these last time except for variable references and function calls And of course, we also have function definitions. So, as I said in the introduction these are the three constructs we'll be looking at in this video. >> The main issue in designing the co-generation for function calls and function definitions is that both of these will depend intimately on the layout of the activation record. So really, co-generation for function calls, co-generation for function definitions and the layout of the activation record all need to be designed together. Now for this particular language, a very simple activation record will be sufficient. Because we are using a stack machine, we are modeling a stack machine in our code generation. The results of a function call will always be in the accumulator and that means there is no need to store the results of the function call in the activation record And furthermore, the activation record will hold the actual parameters. So when we go to computer function call with arguments X1 through XN, we will push those arguments onto the stack And as it happens, these are the only variables in this language that are no local or global variables other than the arguments to a function call And so those are the only variables that will need to be stored in the activation record. Now recall that the stack machine discipline guarantees that the stack pointer is preserved across function calls. So the stack pointer will be exactly the same when we exit from a function call, as it was when we entered the function call And this means we won't need a control link in our activation record. The point of a control link is to help us find the previous activat ion, and since, the stack pointer is preserved, it will have no trouble finding it when we return from our function call, and we'll never need to look at another activation during a function call since there are no non-local variables in the language. We will however need the return address and that will need to be stored somewhere in the activation record And, one more thing. It turns out that a pointer to the current activation will be useful. Now this is to the current activation, not to the previous activation And this pointer will live in the register, FP, which stands for Frame Pointer. This is a conventional, this is a, this is the register name on the [inaudible] And the name is chosen, to denote the frame pointer And by convention, the compilers put the frame pointer there. What the frame pointer is good for, well it points to the current frame, so that's what the name comes from. But, what it's good for, we'll see in a few minutes. Right so to summarize for this language an activation record that has the caller's frame pointer, The actual parameters and the return address will be sufficient. So let's consider a call to the function F and has two arguments X and Y. Then at the time the call is performed before we start executing the body of the function this is what the activation record will look like, So we'll have the old frame pointer. So this is the frame pointer that points to the caller's frame. Not to the frame of the function that we're executing And the reason that it does that is that we have to save it somewhere because the frame pointer register will be overwritten with the frame pointer for the current activation so we have to save the old one, so that we can restart the caller when we return to it, from the current function. And then there the arguments of the function and those that are pushed on the stack in reverse order. So the last argument is pushed on first and the first argument is at the top of the stack And the reason for doing it this way is it'll make the indexing to find the a rguments a little bit easier. A little bit simpler And then We have the stack pointer so there's a, there's nothing here. What will go here is the callee, the function that we're calling, will push on the return address. So this is where the return address will go And these elements, the callers frame pointer, the arguments to the function and the return address of the call function will make up the activation record of F. A bit of terminology, the calling sequence is the sequence of instructions that both the caller and callee to set up a function invocation, okay? So that's referred to in compiler lingo as the calling sequence And we're going to need a new instruction to show the calling sequence for this for, for function calls. And that will be the jump and link instruction. So jump and link what it does is it jumps to the label that it's given as [inaudible] And it saves the address of the next instruction after the jump in link, in the register R.A. Which stands for, return address. So, what would happen in the jump in link instructions, if I have jump in link to label L And then there's an add instruction that comes next. I don't know what it is. It's the address of this instruction, the one after the jump in the link that will be stored in the ret-, in the, in the register RA. So this instruction will jump to L. It will store the address of this add instruction in RAb And it will execute whatever code is at L. And then the code that's at L can execute a jump back to the address in here to execute the return, to the caller. So now we're ready to actually generate code for a function call expression. So let's say we have the call, F of E1 To EN Where of course E1 through EN are expressions. And let me change colors here. So these are expressions, here, not values. So how are we going to do that? >> Well, the first thing we're going to do is we're going to start building, the activation record And so we save the current frame pointer. This is the frame pointer for the collar. >> Okay. >> This is pointing to th e collars frame. >> Right >> And we store that at the stack pointer. We have to bump the stack pointer. And then we generate code for the last argument, for EN, right? And so that code gets inserted here And then we push it on the stack. So we store the results of EN which will be in the accumulator A0 on the stack and then we, bump the stack pointer. Alright, and we'll do that for all the arguments finishing up with E-1. So, we generate code for E-1 and we push it onto the stack. So, now all the arguments are on the stack and now we just do the jump in link. So, we've done as much of the work or much of the calling sequence as we can do on the caller's side. So, this code is executing in the function in the caller. Okay, so this is the caller side of the calling sequence, and it builds up as much of the activation record as it can. In particular it's evaluating the actual parameters and pushing them on to the stack to form part of the activation record, for the called function, and then we do the jump and link. And we jump to the entry point of the function that we're calling. So we're, this is a call to, to F, and so we jump to F's entry point. So a few more things to note, First of all, as we discussed on the previous slide When we execute the jump in link instruction that is going to save the return address in the register RA And that address will be this address here, the one that comes after the, the address of the next instruction, after the jump in link instruction And you'll notice also that the activation record we've built so far is four times N plus four bytes. So this is where N here is the number of arguments. Each argument takes up four bytes, and then four bytes for the old frame pointer. Now we're ready to talk about the callee side of the calling sequence And we're going to need one new instruction for that. The JR instruction stands for jump register. And it just jumps to the address in its register argument. So now, the callee side is the code for the function definition, okay? So this is the co de that actually executes the body of the function. And how do we generate code for that? Well let's take a look. Now actually the very first thing that should be here is that this first instruction of the [inaudible] side is the entry point. So, we're missing the label here So this would be labeled F entry. Okay So this is the target of the jump in link instruction. And then the very first thing we do is we set up the frame pointer. So we copy the current value of the stack pointer into the frame pointer. That sets, that points to the end of the frame for the [inaudible], for the new function that's being executed. We also save the return address at the current position on the stacks. Remember there was one more thing to do one thing one thing that was missing. On the caller side on the caller side of the sequences which is the return address. We don't know the return address until after the jumping link instructions executes And so the callee is the one that has to save that value. Okay so after the jumping link the RA register contains the return address and that we save it into the frame. All right, and then we push the stack pointer. 'Kay. And now we just generate code for the body of the function. So now the, at this point the activation record is completely set up, and now we can just generate code for the function body. And after the function body executes, of course, the stack pointer will be preserved, and, and that means that the return address will be at four offset from the stack pointer, so we can load the return address back into the return address register And then we can pop the stack So here we're going to pop off The current frame from the stack And that's going to be song size, z. Which we I haven't shown you what it is yet But, we'll calculate The size of z in just a minute? This is going to be an immediate value. So it's a constant that we plug in there And then we load the old frame pointer. Okay So once we've incremented the stacks we popped off the existing frame, and so now we're pointing at the frame pointer at the first we're, we're, we're pointing at the first thing beyond the previous stack frame, and, what was that, well, that was the first thing that we saved in the stack frame for F, and that's the old frame pointer. So now we restore the old frame pointer so that the call, the function that called us, we'll have its frame pointer back, and then now we're ready to return it resume execution of the calling function. We just do that by a jump register to the return address, All right? So note here that the frame pointer points to the top of the frame, not the bottom of the frame. Okay? So that will actually be important when we talk about how we use the frame pointer When we get to talking about the variable references next And the callee pops the return address, The actual arguments in the saved value of the frame pointer from the stacks. So the callee pops off the entire activation record, and also restores the caller's frame pointer And what's the value of Z? Well, there are N arguments. Each of which take up four bytes So there's at, so the size of the activation record is four times N. Plus, there are two other values. In the activation record One is the return address. And the other one is the old frame pointer. Okay and the space for two more words is eight bytes. So that's the size of the activation record. So that's how much we have to add to the stack pointer to pop the activation record for F off the stack. Just to give you a sketch of, what this looks like before the call. We have the frame pointer for the caller, and we have the, The current value of the stack pointer And on entry to the function. Okay, after the calling after the calling functions side of the calling sequence has completed what's on the stack, well, we have the old frame pointer, and the two arguments, and then the stack pointer points to the next unused, location. Which is where the return address will go Alright, then we do the jump and link. We jump over, and the return address gets pushed on to the stack, a nd the frame pointer gets moved to point two, the current value of the frame. Okay, you've got [inaudible] to the top of the frame. Okay? And then after the call, what has happened? Well, we've popped everything off the stack, we've popped the entire. Your activation record for the call function off of the stack And so now notice that we're back in the same state. So again, function calls have to preserve the invariant that. The stack is preserved across the call so the stack should be exactly the same after the call, as it was on entry to the call. So we are almost done with code generation for simple language. The last construct we need to talk about is how we generate code for variable references. Now the variables of a function again are just its arguments just the parameters to the function. There are no other kinds of variables in this simple language And these variables are all in the activation record. So we really all we have to do is generate code to look up a variable in its appropriate place in the activation record But there is one problem, and that's that the stack does grow and shrink with intermediate values. So when you call a function and you begin executing its body values will be popped and pushed onto the stack beside the activation record. So think back to the code generation for plus and minus and if then else intermediate values were being pushed and popped from the stack And so what this means is that these variables that are in the activation record are not at a fixed offset From the stack pointer. So we can't use the stack point very easily to the side or to find those variables. So the solution is to use the frame pointer. The frame pointer always points to the return address in the activation record and because it doesn't move during the execution of the function body, we can always find the variables at the same place relative to the frame pointer. So, how do we do that? Well, let's consider the [inaudible] argument, X of I and does the [inaudible] argument to the, to the function. So where is that going to be relative to the frame pointer? That will be at offset Z from the frame pointer And Z is just four times I. Right, and this is actually the reason here for generating for pushing the arguments on the stack in reverse order, starting with the last argument to the function, because it just makes this index calculation simple. It wouldn't be that much more complicated if we pushed the arguments in the other order. It just makes it a little easier to see how the indexing works And anyway this index, this offset, is being calculated at compile time. So, notice that this number, this four times I, is something that the compiler knows, and what we're putting in the code here is just a fixed offset. So, we are not actually doing that multiplication at run time. See here is just a number, as computed statically by the compiler. So anyway We just load and off send Z which is the four times I where I is the index, the position of the variable in the list of parameters. At that offset from the frame pointer, that's where XI is stored in the activation record And we just load it into the accumulator. So that is the entire code generation for a variable reference. Here's a little example. So for the function, the hypothetical function that we've been looking at, with two parameters x and y. X is going to be at the frame pointer +four and y will be at the frame pointer +eight. So to summarize the main point's one very important thing is that the activation record has to be designed together with the code generation. So you have to do these things at the same time. You can't just design the activation record without thinking about what code you're going to generate And you can't just think about generating code without making some decisions about where the data is going to be lived. So the code and the data it manipulates, have to be designed simultaneously. Code generation can be done by a recursive [inaudible] of the abstract syntax street, so just like type checking. Cogeneration can be expressed as a r ecursive tree-walk And that's a very handy way to think about cogeneration because it allows you to think about one case at a time without having to get mixed up thinking about all the different constructs at one time. >> And finally I recommend that you use a stack machine for your compiler. So if you're implementing a course project, the stack machine is the simplest discipline and it gives you a nice framework for think, for breaking up the project into manageable pieces. And because of that simplicity, I think it's a really good way to learn about writing compilers. Now, it is important to realize that production compilers do, do some different things. They're not quite as simple as, the stack machine cogeneration that we have outlined in the last few videos. So, the main differences, or, or, the main difference, is that the big emphasis in a production compiler is on keeping values and registers. It's much more efficient to do operations out of registers than to be saving and loading values from the stack And so, especially the values in the current activation record or current stack frame. It, in production compiler we try to keep those in registers instead of on the stack And also, typically a pressure compiler, to the extent that it has to use temporaries, in the activation record. These would be resolved, laid out directly in the activation record, not pushed and popped from the stack. That means they'd be assigned, pre-assigned locations in the activation record, just like, the function arguments in the simple language we looked at are assigned fixed positions in the activation record. So those temporary values would also be assigned fixed positions, so you could save the trouble of manipulating the stack pointer.