Tip:
Highlight text to annotate it
X
This video is a continuation of the previous video where we'll be finishing up
co-generation for the simple language dealing with function calls, function
definitions and variable references. So just to remind you what we're working on,
here is the simple language And again, we have a bunch of different kinds of
expressions And we dealt with all of these last time except for variable references
and function calls And of course, we also have function definitions. So, as I said
in the introduction these are the three constructs we'll be looking at in this
video. >> The main issue in designing the co-generation for function calls and
function definitions is that both of these will depend intimately on the layout of
the activation record. So really, co-generation for function calls,
co-generation for function definitions and the layout of the activation record all
need to be designed together. Now for this particular language, a very simple
activation record will be sufficient. Because we are using a stack machine, we
are modeling a stack machine in our code generation. The results of a function call
will always be in the accumulator and that means there is no need to store the
results of the function call in the activation record And furthermore, the
activation record will hold the actual parameters. So when we go to computer
function call with arguments X1 through XN, we will push those arguments onto the
stack And as it happens, these are the only variables in this language that are
no local or global variables other than the arguments to a function call And so
those are the only variables that will need to be stored in the activation
record. Now recall that the stack machine discipline guarantees that the stack
pointer is preserved across function calls. So the stack pointer will be
exactly the same when we exit from a function call, as it was when we entered
the function call And this means we won't need a control link in our activation
record. The point of a control link is to help us find the previous activat ion, and
since, the stack pointer is preserved, it will have no trouble finding it when we
return from our function call, and we'll never need to look at another activation
during a function call since there are no non-local variables in the language. We
will however need the return address and that will need to be stored somewhere in
the activation record And, one more thing. It turns out that a pointer to the current
activation will be useful. Now this is to the current activation, not to the
previous activation And this pointer will live in the register, FP, which stands for
Frame Pointer. This is a conventional, this is a, this is the register name on
the [inaudible] And the name is chosen, to denote the frame pointer And by
convention, the compilers put the frame pointer there. What the frame pointer is
good for, well it points to the current frame, so that's what the name comes from.
But, what it's good for, we'll see in a few minutes. Right so to summarize for
this language an activation record that has the caller's frame pointer, The actual
parameters and the return address will be sufficient. So let's consider a call to
the function F and has two arguments X and Y. Then at the time the call is performed
before we start executing the body of the function this is what the activation
record will look like, So we'll have the old frame pointer. So this is the frame
pointer that points to the caller's frame. Not to the frame of the function that
we're executing And the reason that it does that is that we have to save it
somewhere because the frame pointer register will be overwritten with the
frame pointer for the current activation so we have to save the old one, so that we
can restart the caller when we return to it, from the current function. And then
there the arguments of the function and those that are pushed on the stack in
reverse order. So the last argument is pushed on first and the first argument is
at the top of the stack And the reason for doing it this way is it'll make the
indexing to find the a rguments a little bit easier. A little bit simpler And then
We have the stack pointer so there's a, there's nothing here. What will go here is
the callee, the function that we're calling, will push on the return address.
So this is where the return address will go And these elements, the callers frame
pointer, the arguments to the function and the return address of the call function
will make up the activation record of F. A bit of terminology, the calling sequence
is the sequence of instructions that both the caller and callee to set up a function
invocation, okay? So that's referred to in compiler lingo as the calling sequence And
we're going to need a new instruction to show the calling sequence for this for,
for function calls. And that will be the jump and link instruction. So jump and
link what it does is it jumps to the label that it's given as [inaudible] And it
saves the address of the next instruction after the jump in link, in the register
R.A. Which stands for, return address. So, what would happen in the jump in link
instructions, if I have jump in link to label L And then there's an add
instruction that comes next. I don't know what it is. It's the address of this
instruction, the one after the jump in the link that will be stored in the ret-, in
the, in the register RA. So this instruction will jump to L. It will store
the address of this add instruction in RAb And it will execute whatever code is at L.
And then the code that's at L can execute a jump back to the address in here to
execute the return, to the caller. So now we're ready to actually generate code for
a function call expression. So let's say we have the call, F of E1 To EN Where of
course E1 through EN are expressions. And let me change colors here. So these are
expressions, here, not values. So how are we going to do that? >> Well, the first
thing we're going to do is we're going to start building, the activation record And
so we save the current frame pointer. This is the frame pointer for the collar. >>
Okay. >> This is pointing to th e collars frame. >> Right >> And we store that at
the stack pointer. We have to bump the stack pointer. And then we generate code
for the last argument, for EN, right? And so that code gets inserted here And then
we push it on the stack. So we store the results of EN which will be in the
accumulator A0 on the stack and then we, bump the stack pointer. Alright, and we'll
do that for all the arguments finishing up with E-1. So, we generate code for E-1 and
we push it onto the stack. So, now all the arguments are on the stack and now we just
do the jump in link. So, we've done as much of the work or much of the calling
sequence as we can do on the caller's side. So, this code is executing in the
function in the caller. Okay, so this is the caller side of the calling sequence,
and it builds up as much of the activation record as it can. In particular it's
evaluating the actual parameters and pushing them on to the stack to form part
of the activation record, for the called function, and then we do the jump and
link. And we jump to the entry point of the function that we're calling. So we're,
this is a call to, to F, and so we jump to F's entry point. So a few more things to
note, First of all, as we discussed on the previous slide When we execute the jump in
link instruction that is going to save the return address in the register RA And that
address will be this address here, the one that comes after the, the address of the
next instruction, after the jump in link instruction And you'll notice also that
the activation record we've built so far is four times N plus four bytes. So this
is where N here is the number of arguments. Each argument takes up four
bytes, and then four bytes for the old frame pointer. Now we're ready to talk
about the callee side of the calling sequence And we're going to need one new
instruction for that. The JR instruction stands for jump register. And it just
jumps to the address in its register argument. So now, the callee side is the
code for the function definition, okay? So this is the co de that actually executes
the body of the function. And how do we generate code for that? Well let's take a
look. Now actually the very first thing that should be here is that this first
instruction of the [inaudible] side is the entry point. So, we're missing the label
here So this would be labeled F entry. Okay So this is the target of the jump in
link instruction. And then the very first thing we do is we set up the frame
pointer. So we copy the current value of the stack pointer into the frame pointer.
That sets, that points to the end of the frame for the [inaudible], for the new
function that's being executed. We also save the return address at the current
position on the stacks. Remember there was one more thing to do one thing one thing
that was missing. On the caller side on the caller side of the sequences which is
the return address. We don't know the return address until after the jumping
link instructions executes And so the callee is the one that has to save that
value. Okay so after the jumping link the RA register contains the return address
and that we save it into the frame. All right, and then we push the stack pointer.
'Kay. And now we just generate code for the body of the function. So now the, at
this point the activation record is completely set up, and now we can just
generate code for the function body. And after the function body executes, of
course, the stack pointer will be preserved, and, and that means that the
return address will be at four offset from the stack pointer, so we can load the
return address back into the return address register And then we can pop the
stack So here we're going to pop off The current frame from the stack And that's
going to be song size, z. Which we I haven't shown you what it is yet But,
we'll calculate The size of z in just a minute? This is going to be an immediate
value. So it's a constant that we plug in there And then we load the old frame
pointer. Okay So once we've incremented the stacks we popped off the existing
frame, and so now we're pointing at the frame pointer at the first we're, we're,
we're pointing at the first thing beyond the previous stack frame, and, what was
that, well, that was the first thing that we saved in the stack frame for F, and
that's the old frame pointer. So now we restore the old frame pointer so that the
call, the function that called us, we'll have its frame pointer back, and then now
we're ready to return it resume execution of the calling function. We just do that
by a jump register to the return address, All right? So note here that the frame
pointer points to the top of the frame, not the bottom of the frame. Okay? So that
will actually be important when we talk about how we use the frame pointer When we
get to talking about the variable references next And the callee pops the
return address, The actual arguments in the saved value of the frame pointer from
the stacks. So the callee pops off the entire activation record, and also
restores the caller's frame pointer And what's the value of Z? Well, there are N
arguments. Each of which take up four bytes So there's at, so the size of the
activation record is four times N. Plus, there are two other values. In the
activation record One is the return address. And the other one is the old
frame pointer. Okay and the space for two more words is eight bytes. So that's the
size of the activation record. So that's how much we have to add to the stack
pointer to pop the activation record for F off the stack. Just to give you a sketch
of, what this looks like before the call. We have the frame pointer for the caller,
and we have the, The current value of the stack pointer And on entry to the
function. Okay, after the calling after the calling functions side of the calling
sequence has completed what's on the stack, well, we have the old frame
pointer, and the two arguments, and then the stack pointer points to the next
unused, location. Which is where the return address will go Alright, then we do
the jump and link. We jump over, and the return address gets pushed on to the
stack, a nd the frame pointer gets moved to point two, the current value of the
frame. Okay, you've got [inaudible] to the top of the frame. Okay? And then after the
call, what has happened? Well, we've popped everything off the stack, we've
popped the entire. Your activation record for the call function off of the stack And
so now notice that we're back in the same state. So again, function calls have to
preserve the invariant that. The stack is preserved across the call so the stack
should be exactly the same after the call, as it was on entry to the call. So we are
almost done with code generation for simple language. The last construct we
need to talk about is how we generate code for variable references. Now the variables
of a function again are just its arguments just the parameters to the function. There
are no other kinds of variables in this simple language And these variables are
all in the activation record. So we really all we have to do is generate code to look
up a variable in its appropriate place in the activation record But there is one
problem, and that's that the stack does grow and shrink with intermediate values.
So when you call a function and you begin executing its body values will be popped
and pushed onto the stack beside the activation record. So think back to the
code generation for plus and minus and if then else intermediate values were being
pushed and popped from the stack And so what this means is that these variables
that are in the activation record are not at a fixed offset From the stack pointer.
So we can't use the stack point very easily to the side or to find those
variables. So the solution is to use the frame pointer. The frame pointer always
points to the return address in the activation record and because it doesn't
move during the execution of the function body, we can always find the variables at
the same place relative to the frame pointer. So, how do we do that? Well,
let's consider the [inaudible] argument, X of I and does the [inaudible] argument to
the, to the function. So where is that going to be relative to the frame pointer?
That will be at offset Z from the frame pointer And Z is just four times I. Right,
and this is actually the reason here for generating for pushing the arguments on
the stack in reverse order, starting with the last argument to the function, because
it just makes this index calculation simple. It wouldn't be that much more
complicated if we pushed the arguments in the other order. It just makes it a little
easier to see how the indexing works And anyway this index, this offset, is being
calculated at compile time. So, notice that this number, this four times I, is
something that the compiler knows, and what we're putting in the code here is
just a fixed offset. So, we are not actually doing that multiplication at run
time. See here is just a number, as computed statically by the compiler. So
anyway We just load and off send Z which is the four times I where I is the index,
the position of the variable in the list of parameters. At that offset from the
frame pointer, that's where XI is stored in the activation record And we just load
it into the accumulator. So that is the entire code generation for a variable
reference. Here's a little example. So for the function, the hypothetical function
that we've been looking at, with two parameters x and y. X is going to be at
the frame pointer +four and y will be at the frame pointer +eight. So to summarize
the main point's one very important thing is that the activation record has to be
designed together with the code generation. So you have to do these things
at the same time. You can't just design the activation record without thinking
about what code you're going to generate And you can't just think about generating
code without making some decisions about where the data is going to be lived. So
the code and the data it manipulates, have to be designed simultaneously. Code
generation can be done by a recursive [inaudible] of the abstract syntax street,
so just like type checking. Cogeneration can be expressed as a r ecursive tree-walk
And that's a very handy way to think about cogeneration because it allows you to
think about one case at a time without having to get mixed up thinking about all
the different constructs at one time. >> And finally I recommend that you use a
stack machine for your compiler. So if you're implementing a course project, the
stack machine is the simplest discipline and it gives you a nice framework for
think, for breaking up the project into manageable pieces. And because of that
simplicity, I think it's a really good way to learn about writing compilers. Now, it
is important to realize that production compilers do, do some different things.
They're not quite as simple as, the stack machine cogeneration that we have outlined
in the last few videos. So, the main differences, or, or, the main difference,
is that the big emphasis in a production compiler is on keeping values and
registers. It's much more efficient to do operations out of registers than to be
saving and loading values from the stack And so, especially the values in the
current activation record or current stack frame. It, in production compiler we try
to keep those in registers instead of on the stack And also, typically a pressure
compiler, to the extent that it has to use temporaries, in the activation record.
These would be resolved, laid out directly in the activation record, not pushed and
popped from the stack. That means they'd be assigned, pre-assigned locations in the
activation record, just like, the function arguments in the simple language we looked
at are assigned fixed positions in the activation record. So those temporary
values would also be assigned fixed positions, so you could save the trouble
of manipulating the stack pointer.