9.6 - Type environments - Semantic analysis and type checking - [compilers theory] by alex aiken

In this video, we're gonna continue our development of Cool type checking with a discussion of type environments. Let's begin by doing some more type rules. So here's one for the constant false. So it's provable that the constant false has the type [inaudible] and that's not surprising. If I have a string literal S, then it's provable that, that has type string. And that's also not very surprising. The expression new T produces an object of type T. And the type rule for that is very straightforward. New T has type T. And we're just going to ignore self type for now. As I mentioned in an earlier video, we'll deal with self type later in a video all on its own. Here are a couple of more rules. If it's provable that an expression e has type bool then an e Boolean complement of e not e also has type bool. And finally perhaps our most complex rule so far, the rule for a while loop, and we call that the e-1 here is the predicate of the loop, this is what determines if we keep executing the loop or not, and e2 is the body of the loop. And so type one is required to have type bool. It needs to be provable that e one had type bool, and we allow e two, the body of the loop, to have an arbitrary type. It can have any type t. It has to have some type, so it has to be type able under some. Rules, but we don't care what the type is because the type of the entire expression is just object. We don't actually return the, this expression doesn't return an interesting value, doesn't produce an interesting value, and to discourage people from trying to rely on it, we just type the whole thing as object. And this is a little bit of a design decision. Now we could have designed a language, for example, where the type of a while loop is, was type t. And that you would get the last value of the loop that was that was executed but the problems is that if E one the protocol loop is false and reaches the loop the first time Then you never evaluate e two, and no value is produced, and in that case you would get a, a void value. Which if so mebody tried to dereference, it would result in a run time error. That's so to discourage programmers from lying On the loop, producing a meaningful value. We could just type it, as object. So far, it's been pretty straight forward to define reasonable type rules for every construct that we've looked at. But now we actually come to a problem. Let's say we have an expression which consists just of a single variable name, and that's a perfectly valid, cool expression, and the question is, what is the type of that variable, call it X And as you can see. When we're just looking at X by itself we don't have enough information to give X a type. This local structural rule does not carry any information about the type of X And stepping back one level, inference rules have the property that all the information needs to be local. Everything we need to know. To carry out the function of the rule has to be present in the rule itself. There are no external data structures. There's nothing we're passing around here that's on the side. Everything has to be encoded in this rule and, so far at least, we just don't know Enough to say what the type of a variable should be. So the solution to this problem is just to put more information in the rules, and that's what we're going to do, so a type environment gives types for free variables. So what is a free variable, a variable is free in an expression if it is not defined within that expression. So for example, in the expression X, X is free. In the expression x plus y (x+y), well, here this expression uses both x and y, and there's no definition of either x or y in that expression, so x and y are free And that expression. If I have, let Y... So I'm declaring a variable Y in X + Y. Well what's free in this expression, well this expression uses X and Y but the use of Y is governed by a definition of Y that occurs within the expression itself. So we say here that Y is bound, Y is a bound variable in this expression but X is still free, so only X is free in that expression. And the ide a here is, that if I have an expression with three variables, and you want me to type check it, you have to tell me what the types of those variables are. So I can type check X if you tell me what the type of X is. I can type check X plus Y if you tell me the types of X and Y. And I can type check this expression, this line expression, if you tell me the type of its one free variable X, the type of Y. We will be given a declaration by the let, but we still have to tell me what the type X is. So the free variables are just those variables where you have to give me the information and then I can carry out the type checking. The type environment encodes this information so a type environment is a function from object identifiers from variable names to types So let O be a type environment. One of these functions from object identifier names types. And now we're going to extend the kinds of logical statements that we prove to look like this. And the way that this going to be read is that under the assumptions that variables have the types given by O. So the assumptions go over here on the left side of the turnstile. These are the assumptions that we're making about the free variables in E. So the assumption that, excuse me, three variables. Have the types given by o is provable, that's this turn style here, that the expression e has type t. And, so this notation very nicely, separates what we're assuming. This is input to our, process of figuring out what the type is from what we're proving. So if you tell me the types of the free variables as given by o, then I can tell you, the type e. The type environment has to be added to all the rules that we've gone through so far. So for example, for intergal literals if I have some set of assumptions of all the types of variables, that doesn't really change, it doesn't, in fact it doesn't change, what the type is an intergal literal. Any intergal literal will still have type int. And so in this case, for this particular kind of expression, I we don't use any of our assumptio ns about the types of variables. Now, it's a little bit different with the case of sum expressions. So if I have the expression E one plus E two, and I have some assumptions, zero, about the types of variables, well, then I want to prove that E one has type int, and I'm gonna do that using the types of the variables given by zero, so E one might contain free variables and I'll need to look in zero to figure out what the types of those variables are. And similarly for E two, I will type E two under the same set of assumptions. And if E1 has type int under the assumptions O and E2 has type int under the assumptions O. Well, then I can conclude that E1 plus E2 has type int under the same set of assumptions O. And we can also write new rules, so now our big problem with free variables becomes a very easy problem. If I want to know what the type of X is, and there's a missing O here, if I want to know what the type of X is, I simply look it up in my object environment. So under the assumption that the variables have the types given by O, what is the type of x? Well, I look up in O what the type of X is assumed to be, and I then can prove that X has that type T. So now let's take a look at a rule that actually does something interesting with the variables from the point of view of the environments. So here is a [inaudible] expression. And let's remind ourselves what this does. This is a [inaudible] expression that has no initialization. So it says that X is going to be a new variable. It's going to have type T0, and that variable is going to be visible in the sub expression E1. And so, now, how am I going to type check that? Well, I'm going to type check E1 in some kind of environment. And this is a new notation here, so let me define what it means. So remember O is a function, it maps a, variable names to types and OT/X this notation here is also a function. And what this is, is the function O modified at the single point X to return T. So in particular a, this function, this whole thing here is one function, this wh ole thing I'm underlining here is a function, that applied to X is Returns t So that says that this sort of assumptions says that x has type t and for any other variable. So I apply it to some other variable y, where x is different from y. Well then, I just get whatever type y has in [inaudible]. Okay? So what this rule then says is that I'm going to type check E1 in the same environment O, except that at point X, it's going to have the type T0. So we're gonna change just the type of X that have type T0, because that's the type of the new identifier that's bound at E1. And all the other types will be the same. And using those assumptions, I'll try to prove that E1 has some type. I will get a type for E1. And then that will be the type of the entire let expression. Now notice something about the type environment. What this says is that before we type check E1, we need to modify Our set of assumption. Modify our type environment to include a new assumption about x, then we type check e one, and then of course when we leave, type checking e one, we're going to remove that assumption about x, that new assumption, because outside of the let, we just have the original set of assumptions, though. And so, I hope that, that, terminology and that description reminds you of something that we talked about earlier, because this type environment is really implemented by the simple table. So, in our rules, The type environment that carries around the information that will be stored or is typically stored in the symbol table of a compiler. To summarize this video, The type environment gives types to the free identifiers in the current scope. And this is very important, because it doesn't even really makes sense to talk about type checking an expression, unless we have some information for the types of the free identifiers. And the type environment is just a way of formalizing that, of giving a name to some set of assumptions about what the types of those free identifiers are. Notice that the type environment is passed down the abstract syntax tree from the root towards the leaves. That is, as we pass through definitions, the type environment is extended with new definitions For example, [inaudible] expressions. And so the type environment will grow as you pass from the root of the abstract syntax tree down towards the leaves of the abstract syntax tree. And then the types are computed up the abstract syntax tree from the leaves towards the roots. So we begin at the leaves, get all the types of the leaf expressions, most of which are very easy. Things like integers and string constants have the obvious types. And we just look up the types of variables, in the type environment. And then we compute the types for the more complicated expressions in a bottom up fashion.