21. Repeated Games - Cooperation vs. the end game

Professor Ben Polak: Okay, let's make a start. So I hope everyone had a good break. We're going to spend this week looking at repeated interaction. We already saw last time, before the break, that once we repeat games--once games go on for a while--we can sustain behavior that's quite interesting. So, for example, before the break, we saw that we could sustain fighting by players that were rational in a war of attrition. Another thing we learned before the break is when we're analyzing these potentially very long games, it helps sometimes to break the analysis up into what we might call "stage games," each period of the game, and break the payoffs up into: the payoffs that are associated with that stage; payoffs that are associated with the past (they're sunk, so they doesn't really matter); and payoffs that are going to come in the future from future equilibrium play. So those are some ideas we're going to pick up today, but for the most part what we do today will be new. Now whereas last time we focused on fighting, for the whole of today I want to focus on the issue of cooperation. In fact, for the whole of this week, I want to focus on the issue of cooperation. The question behind everything this week is going to be: can repeated interaction among players, both induce and sustain cooperative behavior, or if you like "good behavior." Our canonical example is going to be Prisoners' Dilemma. Way back in the very first class, we talked about Prisoners' Dilemma, and we mentioned that playing the game repeatedly might be able to get us out of the dilemma. It might be able to enable us to sustain cooperation. And what's going to be good about that is not just sustained cooperation, but sustained cooperation without the use of outside payments such as contracts or the mafia or whatever. So why does this matter? Well one reason it matters is that most interactions in society, either don't or perhaps even can't rely on contracts. Most relationships are not contractual. However, many relationships are repeated. So this is going to be of more importance perhaps in general life--though perhaps less so in business--more important in general life than thinking about contracts. So let's think about some obvious examples, think about your own friendships. I don't know if you have any friendships--I assume you do--but for those of you who do your friendships are typically not contractual. You don't have a contract that says if you're nice to me, I'll be nice to you. Similarly, think about interactions among nations. Interactions among nations typically cannot be contractual because there's no court to enforce those would-be contracts, although you can have treaties I suppose. But most interactions among nations--cooperation among nations is sustained by the fact that those relationships are going to go forever. Even in business, even where we have contracts, and even in a very litigious society like the U.S. which is probably the most litigious society in the world, we can't really rely on contracts for everyday business relationships. So, in some sense, we need a way to model a way to sustain cooperation and good behavior that forms, if you like, the social fabric of our society and prevents always going to court about everything. Now, why might repeated interaction work? Why do we think way back in day one of the class that repeated interaction might be able to enable us to behave well, even in situations like Prisoner's Dilemma's or situations involving moral hazard where bad behavior is going to occur in one shot games? So the lesson we're going to be sort of underlying things today and all week is this one. In ongoing relationships the promise of future rewards and the threat of future punishments may--let's be careful--may sometimes provide incentives for good behavior today. Just leave a gap here in your notes because we're going to come back to this. So this is a very general idea. And the idea is that future behavior in the relationship can generate the possibility of future rewards and/or future punishments, and those promises or threats may sometimes provide incentives for people to behave well today. The reason I want to leave a gap here is I want--part of the purpose of this week's lectures is to try and get beyond this. This is kind of, almost a platitude, right? Most of you knew this already. So I want to get beyond this. I want to see when is this going to work? When is it not going to work? How is it going to work? So I don't want people to leave this week of classes or leave the course thinking: oh well, we're going to interact more than once so everything's fine. That's not true. I want to make sure we understand when things work, how they work, and more importantly, when they don't work and how they don't work. So we're going to try and fill in the gap that we just left on that board as we go on today. Nevertheless, we do have this very strong intuition that repeated interaction will get us, as it were, out of the Prisoner's Dilemma. So why don't we start with the Prisoner's Dilemma. I'll put this up out of the way and we'll come back to it. Let's just remind ourselves what the Prisoner's Dilemma is because you guys are all full of turkey and cranberry sauce and you've probably forgotten what Game Theory is entirely. Let's name these strategies, rather than alpha and beta, let's call them cooperation and defect. And that will be our convention this week. We'll call them cooperation and defect. This is Player A and this is Player B, and the payoffs are something like this (2,2), (-1,3), (3, -1) and (0,0). It doesn't have to be exactly this but this will do. This is the game we're going to play. And, to try and see if we get cooperation out of it by having repeated interaction, we're going to play it more than once. So let me go and find some players to play here. This should be a familiar game to everybody here. So why don't I pick some people who are kind of close to the front row. So what's your name again? Student: Brooke. Professor Ben Polak: Brooke. Okay so Brooke is going to be Player B. And I've forgotten your name, by this stage I should know it. Patrick you're going to be Player A. And the difference between playing this game now and playing this game earlier on in the class is we're going to play not once, but twice. We're going to play it twice. So write down what you're going to do the first time and show it to your neighbor. Don't show it to each other. And let's find out what they did the first time. So is it written down? Something's written down? So Brooke. Student: I cooperated. Professor Ben Polak: You cooperated. Patrick? Student: I defected. Professor Ben Polak: Patrick defected, okay. Okay let's play it a second time. So write down what you're going to do the second time. Brooke? Student: This time I'm going to defect. Student: Me too. Professor Ben Polak: All right, so we had the play this time--let's just put it up here--so when we played it this time, we had A and B. And the first time we had (defect, cooperate) and the second time we had (defect, defect). Let's try another pair. We'll just play this a couple times and we'll talk about it. So that's fair enough. Why don't we go to your neighbors. That's fair enough. It's easy. So you are? Student: Ben. Professor Ben Polak: That's a good name, very good okay. You are? Student: Edwina. Professor Ben Polak: Edwina, Edwina and Ben okay. So we're going to make Ben Player B and Edwina Player A. And why don't you write down what you're going to do for the first time. Again, we're going to play it twice. Why don't we mix it up, we can play it three times. We'll play it three times this time okay. We'll play it three times. Both people are happy with their decisions. Okay so the first time Edwina what did you choose? Student: Defect. Student: Cooperate. Professor Ben Polak: All right so we had--let's put down this time--so we've got Edwina will Play A and Ben B. And we had (cooperate, defect). Second time please, Edwina? Student: Cooperate. Student: Defect. Professor Ben Polak: Okay, so we're going to and fro now. So this was cooperate and defect and one more time: write it down. Both players written down? Edwina? Student: Cooperate. Student: Defect. Professor Ben Polak: Okay, so you we flipped round again okay. Okay, so we're seeing some pretty odd behavior here. Who did what that time? Edwina what did you do? Student: I cooperated. Professor Ben Polak: So we had this. Is that right? So keep the microphones for a minute, and we'll just talk about it a second. All right, so first of all let's start with Ben here. Ben you were cooperating on the first go. So why did you choose to cooperate the first turn? Student: I felt that if I established a reputation for cooperating we could end up in the cooperate, cooperate. Professor Ben Polak: All right, so you thought that by playing cooperate early you could establish some kind of reputation. And what about later on when you played defect thereafter, what were you thinking there? Student: I realized that she established a reputation for defecting a second time. Professor Ben Polak: All right, so you switched strategies mid-course. Edwina you started off by defecting. Why did you start off by defecting? Shout it out so people can hear you. Student: Because his friend defected so I thought he might defect. Professor Ben Polak: Okay, his friend defected. Okay, so he's been tainted by his friend there. There's a shortage of space in the class. They could have just been sitting next to each other. Thereafter you cooperated. Why was that? Student: Because I thought he cooperated. Maybe he was going to keep cooperating. Professor Ben Polak: All right, so in fact your reputation works in some sense. By cooperating early you convinced Edwina you would cooperate. And then you went on cooperating even after he defected, so what were you doing in the third round? Shout out. Student: I thought he might cooperate because I cooperated. Professor Ben Polak: All right, he might come back. Let's talk about it to your neighbors. So Brooke, shout out why you cooperated in the first round. Student: Because I was hopeful that he would cooperate. Professor Ben Polak: You were hoping he would cooperate all right, and why did you defect thereafter? Student: Because I thought he would continue to defect after he defected. Professor Ben Polak: Because he defected, he would continue to defect. Patrick, you're the person who just defected throughout here. Grab the mic that's next to you. Why did you just defect? Student: It's such a short game that it makes sense to defect in the last period so the second last period and the first period. Professor Ben Polak: All right, that's an interesting idea. So Patrick's saying, actually if we look at the last period of this game, if we look at this last period of the game, what does the game look like in the last period? Student: It's a single period game. Professor Ben Polak: In the last period, this actually is the game. If I drew out the game with two periods, it would be kind of a hard thing to draw, it would be an annoying diagram to draw. But in the last period of the game, whatever happened in the first period is what? It's sunk, is that right? Everything that happened in the first period is sunk. So in the last period of the game these are the only relevant payoffs, is that right? Since these are the only relevant payoffs looking forward, in the last period of the game, we know that there's actually a dominant strategy. And what is that dominant strategy in the last period of the game? To do what? In Prisoner's Dilemma, what's the dominant strategy? Shout it out. Defect, okay. So what we should see in this game--we didn't actually because we had some kindness over here from Edwina but what we should see in general--is we know that in the last period of the game, in period two, we're going to get both people defecting. The reason we're going to get both people defecting is because the last period of the game is just a one shot game. There's nothing particularly exciting about it. There is no tomorrow, and so people are going to defect. But now let's go back and revisit some of the arguments that Edwina, Brooke, and--I've forgotten what your neighbor is called again, Ben, (I should remember that)--and Ben said earlier. They gave quite elaborate reasons for cooperating: cooperating to establish reputation; cooperating because the other person might cooperate; whatever. But most of these behaviors were designed to either induce or promise cooperation in period two, is that right? What we've just argued is that in period two everyone's going to defect. Period two is just a trivial one stage Prisoner's Dilemma. We actually analyzed it in the very first week of the class. And provided we believe these payoffs, we're done. Period two people are going to defect. Since they're going to defect in period two, nothing I can do in period one is going to affect that behavior and therefore I should defect also in period one. In order to belabor this point, we can actually draw up what the matrix looks like in period one--so let's do that--using the style we did last week--two weeks ago--before we went away. So here once again is the matrix we had before, and I want to analyze the first stage game. In the first stage game, what I'm going to do is I'm going to add in the payoffs I'm going to get from tomorrow. The payoffs I'm going to get from tomorrow are from tomorrow's equilibrium. Well this isn't going to do very much for me as we'll see because I'll get 2 + 0 tomorrow because we know I'm playing defect tomorrow, 2 + 0 tomorrow, - 1 + 0 tomorrow, 3 + 0 tomorrow, 3 + 0 tomorrow, - 1 + 0 tomorrow, and 0 + 0 tomorrow, 0 + 0 tomorrow. So just as we did with the war of attrition game two weeks ago, we can put in the payoffs from tomorrow, we can roll back those equilibrium payoffs to today, it's just in this particular exercise it's rather a boring thing, because we're just adding 0 to everything. When I add 0 to everything, I then just cancel out the zeros and I'm back where I started, and of course I should defect. So what I'm going to see is because I'm going to defect anyway tomorrow, today is just like a one shot game as well. And I'm going to get defect again. Now here we played the game twice and got defect, defect, what about if we played the game three times? It's the same thing. We didn't play three times but we did play the game three times between Edwina and Ben. There we know we're going to defect in the third round. Therefore we may as well defect in the second to last round. Therefore we may as well defect in the first round. If we played it five times, we know we're going to all defect in the fifth round. Therefore we may as well defect in the fourth round. Therefore we may as well defect in the third, and so on. If we played it 500 times, we wouldn't have time in the class, but if we played it 500 times, we know in that 500th period it's a one shot game and people are going to defect. And therefore, in the 499th period people are going to defect. And therefore in the 498th period people are going to defect, and so on. So the problem here is that we get unraveling, something we've seen before in this class, we get unraveling from the back. I have a worry that there might only be one L in unraveling in America, is that right? How many L's do we put in unraveling in America? One, I've just come back from England and my spelling is somewhere in the mid-Atlantic right now. I'll leave it as one. All right: unraveling from the back. Essentially this is a backward induction argument, only instead of using backward induction we're really using sub-game perfection. We're looking at the equilibria in the last games and as we roll back up the game, we get unraveling. So here's bad news. The bad news is, we'd hoped that by having repeated interaction in the Prisoners' Dilemma, we would be able to sustain cooperation. That's been our hope since day one of the class. In fact, we stated it kind of confidently in the first day of the class, and we kind of intuitively believe it. But what we're discovering is, even if you played this game for 500 times and then stopped, you wouldn't be able to sustain cooperation in equilibrium because we're going to get unraveling in the last stage and so on and so forth. So it seems like our big hope that repeated interaction would induce cooperation in society is going down the plug hole. That's bad. So let's come back and modify our lesson a little bit. So what went wrong here was, in the last period of the game, there was no incentives generated by the future, so there was no promise of future rewards or future punishments, and therefore cooperation broke down and then we had unraveling. So the lesson here is what? The lesson is: but for this to work it helps to have a future. This whole idea of repeated interaction was the future was going to create incentives for the present. But if the games come to an end, there's going to be some point when there isn't a future anymore and then we get unraveling. Now this is not just a formal technical point to be made in the ivory tower of Yale. This is a true idea. So for example, if we think about CEO's or presidents, or managers of sports teams, there's a term we use, there's a word we use--at least in the states--to describe such leaders when they're getting towards the end of their term and everyone knows it. What's the expression we use? "Lame duck." So we have this lame duck effect. The lame duck effect at the end of somebody's term undermines their ability to cooperate, their ability to provide incentives for people to cooperate with them, and causes a problem. So this lame duck effect affects presidents but it also affects CEO's of companies. But it's not just leaders who run into this problem. So if you have an employee, if you're employing somebody, you may have a contract with the person you're employing, but basically you're sustaining cooperation with this person because you interact with them often. You know you're always going to interact with them. But then this employee approaches retirement. Everyone knows that in April or something they're going to retire, then the future can't provide incentives anymore. And you have to switch over from the implicit incentive of knowing you're going to be interacting in the future, to an explicit incentive of putting incentive clauses in the contract. So retirement can cause, if you like, a lame duck effect. This is even true in personal relationships. In your personal relationships with your friends, if you think that those friendships are going to go on for a long time, be they with your significant other or just with the people you hang out with, you're likely to get a lot of cooperation. But if, as with perhaps most economics majors, most of your significant others are only going to last for a day at most, you're not going to get great cooperation. You're going to get cheating. No one's rising to that one but I guess it's true. So what do we call these: "economics majors' relationships." These are kind of "end effects." All of these things are caused by the fact that the relationship is coming to an end. And once the relationship is coming to an end, all those threats and promises of future behavior, implicit or otherwise, are going to basically disappear. So at this point we might think the following. You might conclude the following. You might conclude that if a relationship has a known end, if everyone knows the relationship's going to end at a certain time, then we're done and we basically can't sustain cooperation through repeated interaction. That's kind of what the example we looked at seems to suggest. However, that's not quite true. So let's look at another example where a relationship is going to have a known end but nevertheless we are able to sustain some cooperation. And we'll see how. So again, I'm being careful here, I've said it helps to have a future. I haven't said it's necessary to have a future. So that's good news for the Economics majors again. So let's do this example to illustrate that, even a finite interaction, even an interaction that's going to end and everyone knows it's going to end, might still have some hope for cooperation. So look at this slightly more complicated game here. And this game has three strategies and we'll call them A, B, and C for each player. The payoffs are as follows (4,4), (0,5), (0,0), down here we'll do (0,0), (0,0) and (3,3), and in the middle row (5,0), (1,1), (0,0). We're going to assume that this game, just like we did with the first time we did Prisoner's Dilemma, this game is going to be played twice. It's going to be repeated, it's going to be played twice, repeated once. So let's just make sure we understand what the point of this game is. In this game, in the one shot game I hope it's clear that (A, A) is kind of the cooperative thing to do. We'd like to sustain play of (A, A), because then both players get 4 and that looks pretty good for everybody. However, in the one shot game (A, A) is not a Nash equilibrium. Why is (A, A) not a Nash equilibrium? Let me grab those mikes again. Why is (A, A) not a Nash equilibrium? Anybody? I'm even getting to know the names at this stage of the term. This is Katie, right? So shout out. Student: The best response to the other guy playing A is playing B. Professor Ben Polak: Good, so if I think the other person's going to play A, I'm going to want to defect and play B and obtain a gain, a gain of 1. So basically I'll get 5 rather than 4. I'll defect to playing B and get 5 rather than 4 for a gain of 1, is that right? So (A, A) is not a Nash equilibrium in the one shot game--we're sometimes going to call it--it's fine--in the one shot game. So now imagine we play this game twice. Instead of just playing once, we're going to play this game two times. I'll come back to that. Before I do that what are the pure strategy Nash equilibria in this game? Anybody? So the Nash equilibria in this one shot game are (B, B) and (C, C). There's some mixed ones as well but this will do. So (B, B) and (C, C) are the pure strategy Nash equilibria. Now consider playing this game twice. And last time we looked at a game played twice, it was Prisoner's Dilemma, and we noticed that we couldn't sustain cooperation because in the last stage people weren't going to cooperate and hence in the first stage people weren't going to cooperate. But let's look what happens here. If this game is played twice is there any hope of sustaining cooperation, i.e. A, in both stages? Could we have people play A in the first stage and then play A again in the second stage? So Patrick's shaking his head. So that's right to shake his head. So let me grab the other mike. So why is that not going to work? Why can't we get people to cooperate and play A in both periods? Shout out. Student: In the second period you're still going to defect and play B. Professor Ben Polak: Good, so in the second period, exactly the argument that Katie produced just now in the one shot game applies because the second period game is a one shot game. So we've got no hope of sustaining cooperation in both periods. Let's call this cooperation. We can't sustain (A, A) in period two, in the second period. However, I claim that we may be able to get people to cooperate in the first period of the game. Now how are we going to do that? So to see that let's consider the following strategy. But consider the strategy--the strategy's going to be--play A and then play C if (A, A) was played; and play B otherwise. So this strategy is an instruction telling the player how to play. Now, before we consider whether this is an equilibrium or not, let's just check that this actually is a strategy. So what does a strategy have to do? It has to tell me what I should do--it should give me an instruction--at each of my information sets. In this two period game, each of us, each of the players in the game has two information sets. They have an information set at the beginning of the game and they have another information set at the beginning of period two. Is that right? So it has to tell you what to do at the first information set and at the second information set and it does. It says play A at the first one, and then at the beginning of period two--I said there's only one information set there, but actually there's nine possible information sets depending on what happened in the first period. So each thing that happened in the first period is associated with a different information set. I always know what happened in the first period. And at each of those nine information sets it tells me what to do at the beginning of period two. In particular, it says if it turns out that (A, A) was played then play C now. And otherwise, all the other eight possible information sets I could find myself in, play B. So this is a strategy. Now of course the big question is, is this strategy an equilibrium, and in particular is it a sub-game perfect equilibrium? Let me a bit more precise: if both players were playing this strategy would that be a sub-game perfect equilibrium? Well let's have a look. (Of course I don't see it now so let's pull both these boards down.) So to check whether this is a sub-game perfect equilibrium, we're going to have to check what? We're going to have to check that it induces Nash behavior in each sub-game. (I think the battery is going on that. Shall I get rid of that. Okay, I'm going to shout. Can people still hear me? People in the balcony can they hear me? Yup, okay.) So we're going to have to see if we can sustain Nash behavior in each sub-game. So let's start with the sub-games associated with the second period. Technically, there are nine such sub-games depending on what happened in the past, depending on what happened in the first period. There's a sub-game following (A,A). There's a sub-game following (A,B). There's a sub-game following (A,C), and so on. So for each activity in the first period, for each profile in the first period there's a sub-game. However, it doesn't really matter to distinguish all of these sub-games particularly carefully here, since the costs from the past, what happened in the past is sunk, so we'll just look at them as a whole. So in period two after (A,A)--so you have one in particular of those nine sub-games--after (A,A), this strategy induces (C, C). If people play A, if both people play A in the first period, then in the sub-game following people are supposed to play (C, C). Is that a Nash equilibrium of the sub-game? Well was (C,C) a Nash equilibrium? Yeah, it's one of our Nash equilibria. Let's just look up there, we've got it listed. There it is. So we're playing this Nash equilibrium. So that is a Nash equilibrium, so we're okay. After the other choices in period one, this strategy induces (B, B). That's good news too because (B, B), we already agreed, was a Nash equilibrium in the one shot game. So in all of those nine sub-games, the one after (A,A), and the eight after everything else, we're playing Nash behavior, so that's good. What about in the whole game? In the whole game, starting from period one, we have to ask do you do better to play the strategy as designated, in particular to choose A, or would you do better to defect? Well let's have a look. So if I choose A--remember the other person is playing this strategy--so if I choose A then my payoff in this period comes from (A,A) and is 4. If I choose A then we're both playing A in this period and I get 4. Tomorrow, according to this strategy--tomorrow since we both played A--both of us will now play C. Since we're both playing C, I'll get an additional payoff of 3. So tomorrow (C, C) will occur and I'll get 3 for a total of 7. What about if I defect? We could consider lots of possible defections, but let's just consider the obvious defection. You can check the other ones at home. So if I defect and choose B now, then, in this period, I will be playing B and my opponent or my pair will be playing A. So in this period I will get 5. And tomorrow, since (A,A) did not occur, both of us will play B and get a continuation payoff of 1. So the continuation payoff this time will be following from (B, B), and I'll get 1. Why don't I do what I've been doing before in this class and put boxes around the continuation payoffs, just to indicate that they are, in fact, continuation payoffs. So if I play A, I get 4 now and a continuation payoff of 3, for a total of 7. If I play B now, I gain something now, I get 5 now, but tomorrow I'll only get 1 for a total of 6. So in fact, 7 is bigger than 6, so I'm okay and I won't want to do this defection. I just want to write this one other way because it's going to be useful for later. So one other way to write this, I think we've convinced ourselves that this is an equilibrium, but one other way to write this is, and it's a more general way in repeated games, is to write it explicitly comparing the temptations to cheat today with the rewards and punishments from tomorrow. So what we want to do is, in general, we can just rewrite this as checking that the temptation to cheat or defect today is smaller than the value of the reward minus the value of the punishment. But the key words here are: defecting occurs today; rewards and punishments occur tomorrow. If we just rewrite it this way, we'll see exactly the same thing, just rearranging slightly. The temptation to defect today is I get 5 rather than 4, or if you like a gain of 1. And the value of the reward tomorrow--the reward was to play (C, C) tomorrow and get 3. The value of the punishment tomorrow was to play (B, B) tomorrow and get 1, and that difference is 2. So here the fact that the temptation is outweighed by the difference between the value of the reward and the value of the punishment is what enabled us to sustain cooperation. I'm just writing that in a more general way because this is the way that we can apply in games from here on. We're going to compare temptations to cheat with tomorrow's promises. Patrick, let me get you a mike. Student: I don't understand why it's reasonable to think you would play (B, B) in the second period though. In the second period you have a temptation to play C, C even if the person defected on you. Professor Ben Polak: Good, that's a very good point. So what Patrick's saying is, it's all very well to say we're sustaining cooperation in the first period here, but the way in which we sustained cooperation was by going along with, as it were, the punishment tomorrow. It required me, tomorrow, to go along with the strategy of choosing B if I cheated in the first period. I want to answer this twice, once disagreeing with him and once agreeing with him. So let me just disagree with him first. So notice tomorrow, if the other person, the other player is going to play B then I'm going to want to play B. So the key idea here is, as always in Nash equilibrium, if I take the other person's play as given and just look at my own behavior--if I think the other person is playing this strategy and hence he's going to play B tomorrow after I've cheated--then I want to be play B myself. So that check is just our standard check, and actually that's the check that makes sure that it really is a sub-game perfect equilibrium. We're not putting some punishments down the tree that are arising out of equilibrium. It has to be I want to do tomorrow what I'm told to do tomorrow. So that idea seems right and I'm glad Patrick raised it because that was the next thing in my notes. I want to go along with this punishment because if the other person's playing B I want to play B myself. Nevertheless, I think Patrick's onto something and let me come back to it in a minute. What I want to do before I do that is just draw out a general lesson from this game. The general lesson is we can sustain cooperation even in a finitely repeated game, but to do so we need there to be more than one Nash equilibrium in the stage game. What we need there to be is several Nash equilibria, one at least of which we can use as a reward and another one which we can use as a punishment. So even if a game is only played a finite number of times, if there are several equilibria in the stage game, both (B,B) and (C,C), we can use one of them as a reward and the other one as a punishment, and use that difference to try and get people to resist temptations today. So that's the general idea here, and let's just write that down. Patrick don't let me get away with not coming back to your point, I want to come back to it in a second. So the lesson here is, if a stage game--a stage game is the game that's going to be repeated--if a stage game has more than one Nash equilibrium in it, then we may be able to use the prospect of playing different equilibria tomorrow to provide incentives--and we could think of these incentives as rewards and punishments--for cooperation today. In the game we just saw, there were exactly two pure strategy Nash equilibria in the sub-game. We used one of them as a reward and the other one as a punishment, and we were able to sustain cooperation in a sub-game perfect equilibrium. Now, a question arises here, and I think it's behind Patrick's question, and that is how plausible is this? How plausible is this? Formally, if we write down the game and do the math, this comes out. But how plausible is this as a model of what's going on in society? I think the worry--I'm guessing this is worry that was behind Patrick's question--is this. Suppose I'm playing this game with Patrick and suppose Patrick cheats on me the first period, so Patrick chooses B while I wanted him to choose A in the first period. Now in the second period, according to the equilibrium instructions, we're supposed to play (B, B) and get payoffs of 1 rather than (C,C) and get payoffs of 3. So let's make that visible again. But suppose Patrick comes to me in the meantime. So between period one and period two, Patrick shows up at my office hours and he says: yeah, I know I cheated on you yesterday, but why should we punish ourselves today? Why should we, both of us, lose today by playing the (B,B) equilibrium? Why don't we both switch to the (C,C) equilibrium? After all, that's better for both of us. Patrick's saying to me, it's true that I cheated you yesterday, but "let bygones be bygones," or "why cry over spilt milk," or he'll use some other saying plucked out of the book of platitudes, and say to me: well why go along with the punishment. Let's just play the good equilibrium now. And, if I look at things and I say well, actually, it's true I got nothing in the first period because Patrick kind of cheated me in the first period--so it's true I got nothing yesterday--and it's true it was Patrick who caused me to get nothing yesterday, but nevertheless that's a sunk cost and I'm comparing getting 1 now with getting 3 now. Why don't we just go along and get 3? Moreover, I'm not in danger of being cheated again because if Patrick believes I'm going to play C, he's going to play C too. So that kind of argument involves what? It involves some kind of communication between stages, but it sounds like that's going to be a problem. Why? Well, suppose it's the case that we are going to get communication between periods and suppose it's the case that someone with the gift of the gab, someone on his way to law school like Patrick, is going to be able to persuade me to go back to the good equilibrium for everybody in period two, then we know we're going to play the good equilibrium in period two and now we've lost any incentive to cooperate in period one. The only reason I was willing to cooperate in period one was because the temptation to defect was outweighed by the difference between the value of the reward and the value of the punishment. If we're going to get the reward anyway, I'll go ahead and defect today. So the problem here is this notion of "renegotiation," this notion of communicating between periods can undermine this kind of equilibrium. There's a problem that arises if we have renegotiation. So there may be a problem of renegotiation. Now, this probably may not be such a big problem. For example, it may be, say, I'll be so angry at Patrick because he screwed me over in period one that I won't go along with the renegotiation. It may also be the case, and we'll see some examples of this on the homework assignment, that the many equilibria in the second stage of the game are not such that a punishment for Patrick is also a punishment for me. What really caused the problem here was, in trying to punish Patrick, I had to punish myself. But you could imagine games, or see some concrete examples on the next homework assignment, in which punishing Patrick is rather fun for me, and punishing me is rather fun for Patrick, and that's going to be much harder to renegotiate our way out of. There was a question, let me get a mike out to the question. Yeah? Student: If we're ruling out renegotiation, can't we devise a strategy for Prisoner's Dilemma as well even though it doesn't have multiple Nash equilibriums? Professor Ben Polak: Yeah, okay good, so the issue there is, in Prisoner's Dilemma, we established in the first week that if we're not allowed to make side payments, we're not allowed to bring in outside contracts, then no amount of communication is going to help us. So you're right if we can rely on the courts or the mafia to enforce the contracts that would be fine and then communication would have bite. But you remember way back in the first week when we tried to talk our way out of bad behavior in the Prisoner's Dilemma it didn't help precisely because it's a dominant strategy. Whereas, here, Patrick's conversation, Patrick's verbal agreement to play the other equilibrium is an agreement to play a Nash equilibrium. That's what is getting us into trouble. So what may help us here, what may avoid renegotiation is simply I'm not going to go along with that renegotiation--I'm too angry about having been cheated on--and it may be for other reasons it may actually be that I enjoy the punishment. Nevertheless, this is a real problem in society and I don't think we should pretend that this problem isn't there. So a good example is in bankruptcy, which is one of those words I can never spell. It seems I have too many consonants in it, is that right? It's approximately right anyway. So bankruptcy law in the U.S. for the last 200 odd years has gone through cycles. One way to view these cycles is, they're cycles of relaxing the law and making life "easier for borrowers" and then tightening up again. This is not a recent phenomenon, this is not only a recent phenomenon, this occurred throughout the nineteenth century. So what typically happened was there was either explicit renegotiation between parties or renegotiation through act of Congress or sometimes through the acts of the states, in which bankrupt debtors were basically let off or given easier terms. The argument was always the same. These people are not going to pay back now. It's clear from the nineteenth century, often if you were bankrupt you were in jail, actually worse than that. Sometimes in the nineteenth century in England not only if you were bankrupt were you in jail but your creditors were having to pay the fees to feed you in jail. So there you were sitting in jail, you weren't paying that money back to your creditor, and you're actually costing money to your creditor by being in jail. This seems like a situation that you want to renegotiate your way out of. You say, hey let's let these guys out of jail. Let them be productive again, and then they'll pay back part of the loans. So you had these waves of bankruptcy reform in which the debtors' prisons were closed down, people were let out, people were relieved of debt. What's the problem with doing that? That seems like a good idea right. After all, you don't want all these people bankrupt, in debt, not paying money back to their creditors anyway. That doesn't seem like a good situation in society. It seems like a renegotiation that's a win-win situation: it's better for everybody. What's the problem with it though? Let's get a mike down here. What's the problem with this? Student: It incentivizes bankruptcy. Professor Ben Polak: Right, it creates an incentive for people not to repay in the first place. It creates an incentive for people to take big risks now, and hence, it makes bankruptcy, if you like, or makes non-repayment of debt more likely. So this has been going on for a while, but you see it very much today if you read the financial pages of the papers in the last few weeks. There's a big worry in the U.S. right now about people failing to repay what kind of debt? What kind of debt is the big worry about? Mortgage debt, right, so those people who are house owners failing to pay back mortgage debt and, equally worrying, financial institutions that have a lent a lot of, for example, sub-prime debt now find themselves in financial trouble. You're going to read a lot in the papers about not letting people out lightly out of those situations of being in debt, or not letting people out lightly out of bankruptcy. The term you're going to hear is "bail out." So bail out--the argument you're going to read is, you don't want the government or the central bank bailing out those financial institutions who have apparently taken too large risks on sub-prime mortgage debts, even though we all agree it's better right now for those financial institutions not to go under. Why are we not going to--Even though it's better for everybody for it not to go under, why are we not going to bail them out? Because it undermines the incentives for them not to make bad loans to start with. To a lesser extent you're going to hear that on the debtor side as well. You're going to hear some people say we shouldn't be bailing out people who took on bad loans, took on bad mortgages to finance their houses, again for bail out reasons. So this is an important trade off. If you go on to law school, you're going to see a lot about this kind of discussion, and this is the discussion of trading off ex-ante efficiency and ex-post efficiency. Sometimes, as Patrick has pointed out in the game just now, the ex-post efficient thing to do is to go back to the good equilibrium, or if you like to bail out these firms who've made bad loans. However, from an ex-ante point of view, it creates bad incentives for people to make those loans in the first place; and, in the ex-ante point of view, it created the incentive for people to defect in the first period of that game. So this theme of ex-ante versus ex-post efficiency is not one we're going to go into anymore in this class, but it should be there in the back of your minds when you all end up in law school in a few years time. Okay, so, so far what have we done? We've been looking at repeated interaction and seeing if it can sustain cooperation. The first thing we learned was that if the repeated interaction is a finite interaction, if we know when it's going to end--we know when the interaction's going to end--then sustaining cooperation is going to be hard because in the last period there will be an incentive to defect. We saw we could get around that to some extent if games have multiple equilibria, but in a game like Prisoner's Dilemma, we're really in trouble. Things will unravel from the back. So now let's mix things up a little bit by looking at a more complicated variety of repeated interactions. Rather than just play the game once or twice, or three times, let's play the game under the following rules. We'll go back to our same players, how many mikes are still out here? I took them both back, is that right? I'm taking both the green and the blue mike, and I'm giving them back to our players. So this is to Brooke and this is to Patrick. And we're going to have Brooke and Patrick play Prisoner's Dilemma again. I'm hoping I haven't deleted it. Maybe I did. It doesn't matter we know the payoffs. We're going to have them play Prisoner's Dilemma again, but this time, in between every play of the game, I'm going to toss a coin. Actually I'll toss the coin twice and if that coin comes up heads both times then the game will end, but otherwise they'll play again. So everyone understand what we're going to do? We're going to play Prisoner's Dilemma. At the end of every period I'll toss a coin twice. I might get Jake to toss it. Jake will toss a coin twice. If it comes up heads both times the game's over but otherwise the game continues. So both Brooke and Patrick should get ready to play, and the payoffs of this game are just what we had before. So let's just remind ourselves what the payoffs of that game are. So we've got cooperate, defect, cooperate, defect, (2,2), (-1,3), (3, -1) and (0,0). And we'll keep score here: so this Brooke and Patrick. So, putting pressure on these guys, let's write down what you're going to do the first time. Brooke? Student: Defect. Professor Ben Polak: Patrick? Student: Cooperate. Professor Ben Polak: All right. I think we're getting some payback from earlier, right. Round two. Student: Are you going to toss the coin? Professor Ben Polak: Oh I have to toss the coin, you're absolutely right, thank you. Now I have to find a coin. Look at that, thank you Ale. Twice: toss it twice. Heads, heads again, so the game is over. That didn't last long. Just for the sake of the class, let's pretend that it came up tails. Okay we'll cheat a little bit. Okay, so we're playing a second time--just with a little bit of cheating. I need someone else, someone less honest to toss the coin. Brooke what do you choose? Student: Oh I'm defecting. Professor Ben Polak: Defecting again, Patrick? Student: Cooperate. Professor Ben Polak: Cooperate, Patrick seems very trusting here, all right let's toss the coin a third time. All right, Brooke? Student: I'm going to defect again. Student: Defect. Professor Ben Polak: All right, heads, heads, so this time we'll end it. So what happened this time, let's just talk about it a bit. So Brooke and Patrick were playing, Patrick cooperated a bit in the beginning, Brooke's defected throughout. Brooke why did you defect? Shout out so everyone can hear you. Why did you defect right from the start of the game? Student: Because last time it didn't work so well cooperating. Professor Ben Polak: Last time it didn't work so well, okay. Fair enough but even after Patrick was sort of cooperating you went on defecting. So why then? Student: Because I wanted to get the higher payoff, I thought either he would continue cooperating and I could defect, Professor Ben Polak: All right, if he had gone on cooperating, which in fact he did. Patrick why were you cooperating early on here? Shout out so people can hear you. Student: So with a two head rule, like you have a 75% chance at having another game. So with those payoffs, even one period the payoff of cooperating twice is the same as defecting once, so it's better if you can continue cooperating, and the percentage is high enough that it would make sense to do so. Professor Ben Polak: All right, if you figure there's a good enough chance of getting--even after Brooke's defected the first period you went on cooperating, but then after the second period you gave up and started defecting. If it had gone on to the fourth period what would you have done? Student: Defected. Professor Ben Polak: You would have defected again, all right. Fifth period? Student: Well if she kept defecting, I would keep defecting. Professor Ben Polak: All right, so what Patrick's saying is he started off cooperating but once he saw that Brooke was defecting, he was going to switch to defect. And basically as long as she went on defecting, he was going to stick with defecting. Let's try a different pair. So why don't we switch it over to your partners there. So Ben here and Edwina. So why don't you stand up. I want everybody to see these people. So stand up a second. So these are our players, I want people at the back to essentially know who are playing, this is Edwina and this is Ben. So Edwina--sit down so you can actually write things down. So Edwina and Ben, Edwina, have you both written down a strategy? Ben, have you written down a strategy? Edwina what did you choose? Student: Cooperate. Professor Ben Polak: So Edwina's cooperating, Ben? Student: Cooperate. Professor Ben Polak: Okay, let's toss the coin. So we're okay, so we're still playing. Edwina? Student: Cooperate. Professor Ben Polak: Ben? Student: I chose cooperate. Professor Ben Polak: All right, so they're cooperating. Tails again, so you're still playing. Student: Cooperate. Student: Cooperate. Professor Ben Polak: All right, so they're still cooperating. Some pain in the voice this time. Heads and then tails, write down what you're going to do. Edwina? Student: Defect. Professor Ben Polak: Ben. Student: Cooperate. Professor Ben Polak: Things were going so nicely there. We had such a nice class going on there–. All right, so we're still playing. Edwina? Student: Defect. Professor Ben Polak: Ben? Student: Defect. Professor Ben Polak: All right, Jake? Tails, tails, we're still going. Student: Defect. Student: Defect. Professor Ben Polak: All right, let me stop it there, we'll pretend that we had two heads. So let's talk about this. We had some cooperation going on here, both people started cooperating. So Ben, why did you cooperate at the beginning? Student: Well, going along with Patrick's reasoning I felt that if we could have the cooperate, cooperate in the long term with the 75% chance of continuing playing, that it would be a worthwhile investment. Professor Ben Polak: All right. Student: Until I realized that Edwina had started defecting. Professor Ben Polak: Let's come back a second. Let's get you guys to stand up so people can hear you. When you stand up you shout more. So stand up again. Edwina, so you also started cooperating, why did you start cooperating? Student: For the same reason. Professor Ben Polak: Same reason, okay. So the key thing here is why did you start defecting? You heard the big sigh in the class. Why did you start defecting at this stage? Student: Because we'd had so many, I mean the coin toss had to come to heads, heads sometime, so I started thinking that maybe- Professor Ben Polak: The reversion to the mean of the coin. Student: Yeah, I just thought that it. I thought, I mean, I don't know. Professor Ben Polak: So what did I say about the relationships of Economic majors that are in the class? Anyway, all right, so Edwina defected and then Ben you switched after, why did you switch? Student: Because once Edwina started defecting I felt that we'd revert back to the defect, defect equilibrium. Professor Ben Polak: All right, so thank you guys. So there's another good strategy here. People started off cooperating and I claim that at least Ben--Ben can contradict me in a second--but I think Ben's strategy here was something like this. I'm going to cooperate and I'm going to go on cooperating as long as we're cooperating. But if at some point Edwina defects--or for that matter I defect--then this relationship's over and we're going to play defect forever. Is that right? That's kind of a rough description of your strategy? Edwina was more or less playing the same thing. In fact it was her who defected, but once she defected she realized that it was over and she went on defecting. So this strategy has a name. Let's just be clear what the strategy is. This strategy says play C which is cooperate, and then play C if no one has played D; and play D otherwise. So start off by cooperating. Keep cooperating as long as nobody's cheated. But if somebody cheats, this relationship's over: we're just going to defect forever. Now this strategy is a famous strategy. It has a name. Anyone know what the name is? This is called the "Grim Trigger Strategy." So this strategy again, it says we're going to cooperate, but if that cooperation breaks down ever, even if it's me who breaks it down, then I'm just going to defect forever. Now, we're going to come back next time to see if this is an equilibrium, but there's a few things to do first. First let's just check that it actually is a strategy. What does it mean to be a strategy again? It has to tell us what to do at every information set I could find myself at. And this game is potentially infinite, so potentially there's an infinite number of information sets I could reach. So you might think that writing down a strategy that gives me an instruction at every single information set is going to be incredibly complicated once we go to games that are potentially infinite, because there needs to be an infinite number of instructions. But it turns out, actually it's possible to write down such strategies rather simply, at least if they're simple strategies. This example is one. This tells me what to do at the first information set, it says play C. It then tells me for every information set I find myself at, in which only cooperation has ever occurred in the history of the game, I'm going to go on cooperating: play C. And it says for all other histories, for all other information sets I might find myself at, play D. So it really is a strategy. Now this is very different behavior--we played with the same players--this kind of behavior is very different, in both games actually, is very different than the behavior we saw in the game that ended, the game with two periods or three periods. What is it essentially that made this different? What's different about this way of playing Prisoner's Dilemma, where we had Jake toss the coin versus the way we played before and we just played for five periods and then stopped? What's different about it? Somebody? Let's talk to our players, Patrick why is this different? Student: We don't when the game is going to end or if it's going to end, so there's no last period. Professor Ben Polak: Good, so our analysis of the game before, the analysis of the Prisoner's Dilemma when we knew it was going to end after two periods, after five periods, whatever it was, was we all knew it was going to end. There was a clearly defined last period. When people are going to retire, we know the month in which they're going to retire. When President's are going to step down, we know they're going to step down that period. When CEO's are going to go, we know they're going to go--or acctually we don't always know they're going to go but let's just pretend we do. So what's different about this game is, every time we play the game, there is a probability, in this case a .75 probability that the game is going to continue to the next period. Every time we play the game, with probability of .75 there's going to be a future. There's no obvious last period from which we can unravel the game in the way we did before. Just to remind ourselves, the way in which cooperation--our analysis of cooperation--broke down in the finitely repeated Prisoner's Dilemma, was when we looked at the last period, we know people are going to defect. And once that thread is loose we can unravel it all the way back to the beginning. But here, since there is no last period that unraveling argument never gets to hold. Now instead we're able to see strategies emerge like the Grim Trigger Strategy, and notice that the Grim Trigger Strategy has a pretty good chance of actually sustaining cooperation. So in particular, as long as people play this strategy they are cooperating. It turns out that Edwina eventually gave up that strategy, but had she gone on playing it, they would have gone on cooperating forever. But of course there's a question here, and the question is: is this in fact an equilibrium? We know that if people play this way, we get cooperation, but the question--the thousand dollar question or whatever--is: is this an equilibrium? So what do we have to do check whether this is an equilibrium or not? We have to mimic the argument we had before. We have to compare the temptation to defect today and compare that with the value of the reward (to cooperating) and the value of the punishment (from defecting) tomorrow. So this basic idea is going to re-emerge. Having said that, let me now delete it so I have some room. To show this is an equilibrium, we need to show that the temptation to defect--the temptation to cheat in the short term--is outweighed by the difference between the value of the reward and the value of the punishment. All right, so let's set that up. Let's put the temptation here first. So the temptation in Prisoner's Dilemma, the temptation to cheat today is what? I'll get 3 rather than 2, is that right? So if I defect--when Edwina defected: here's Edwina defecting in this period--she got a payoff of 3 rather than the payoff of 2 she would have got from cooperating. So the temptation here is just 3 - 2 and let's be clear, this is a temptation today and we want to compare this with the value of the reward minus the value of the punishment, but the key observation is that these occur tomorrow. So since they occur tomorrow we have to weight them a little bit lower. So in general, the way in which we're going to weight them tomorrow is we're going to discount them just like we did in our bargaining game. We're going to weight tomorrow's payments by δ, where δ < 1. Now why is δ < 1? Why are we weighing tomorrow less than payment today? Why are payments tomorrow worth less than payments today? Because tomorrow might not happen. There are other reasons why, by the way. It might be that we are impatient to get the money today. Edwina just wanted the payoff in a hurry, or it might be that she wanted to take the payment today and put it in the bank and earn interest. There are other reasons why money today might be more valuable than money tomorrow, but, in games, the most important reason is: tomorrow may not happen. By tomorrow you might be dead, or, if not dead, at least Jake's thrown two heads in the coins. So δ is less than 1 because the game may end. Now, what's the value of the reward? The value of the reward is going to be the value of C "for ever," but you want to be careful about "for ever." It's C for ever, but of course it isn't really for ever because the game may end. So by "for ever" I mean until the game ends. Let me be a bit more careful actually, it's (C, C) isn't it? The value of (C, C)--(cooperate, cooperate)--for ever. Here we're going to have the value of (D, D) for ever. And once again, the for ever here means until the game ends. So this is the calculation we're going to have to do. We're going to have to compare the temptation, that was easy, that was just 1 with the discounted difference between the value of cooperation and the value of defecting. Let's do the easy bits now, and then we'll leave you in suspense until Wednesday. So let's do all the easy bits. So what's this δ in this case? In this particular game what was the probability that the game was going to continue? What was the probability that the game was going to end? The probability of it ending was .25, so δ here was .75, that's easy. The second bit that's relatively easy is what's the value of playing (D, D) until the game ends? Once people have cheated you're going to play D for ever--here we are: Edwina's cheating here. You're going to get (D, D) in this period, (D, D) in this period, and so on and so forth until the game ends. In each of those periods you're going to earn 0, so this is just 0. Which leaves us with a messy bit: what's the value of cooperating forever? Let's try and do it. We've got one minute. Let's do it. So in every period in which we both cooperate what do we earn? Throughout the beginning of the game: we cooperated in the first period; now in the second period, we cooperate again. What payoff do we get from cooperating again? We get 2 and then Jake tosses his coin and with probability δ we continue and we're going to cooperate again. So with probability δ we cooperate again and get what payoff the next period? 2 again, and then Jakes tosses the coin again, so now he's tossed the coin twice, so with probability δ² we're still playing and we get 2, and then Jakes tosses the coin again and it comes up other than heads, heads again, that's with probability δ³ we get 2 and so on. So your exercise between now and Wednesday is to figure out what the value of cooperation forever is: figure out this equation and find whether in fact it was an equilibrium for people to cooperate. We'll pick it up on Wednesday.