Tip:
Highlight text to annotate it
X
Professor Ben Polak: Okay, let's make a start.
So I hope everyone had a good break.
We're going to spend this week looking at repeated interaction.
We already saw last time, before the break,
that once we repeat games--once games go on for a while--we can
sustain behavior that's quite interesting.
So, for example, before the break,
we saw that we could sustain fighting by players that were
rational in a war of attrition. Another thing we learned before
the break is when we're analyzing these potentially very
long games, it helps sometimes to break the
analysis up into what we might call "stage games,"
each period of the game, and break the payoffs up into:
the payoffs that are associated with that stage;
payoffs that are associated with the past (they're sunk,
so they doesn't really matter); and payoffs that are going to
come in the future from future equilibrium play.
So those are some ideas we're going to pick up today,
but for the most part what we do today will be new.
Now whereas last time we focused on fighting,
for the whole of today I want to focus on the issue of
cooperation. In fact, for the whole of this
week, I want to focus on the issue of cooperation.
The question behind everything this week is going to be:
can repeated interaction among players,
both induce and sustain cooperative behavior,
or if you like "good behavior." Our canonical example is going
to be Prisoners' Dilemma. Way back in the very first
class, we talked about Prisoners' Dilemma,
and we mentioned that playing the game repeatedly might be
able to get us out of the dilemma.
It might be able to enable us to sustain cooperation.
And what's going to be good about that is not just sustained
cooperation, but sustained cooperation without the use of
outside payments such as contracts or the mafia or
whatever. So why does this matter?
Well one reason it matters is that most interactions in
society, either don't or perhaps even can't rely on contracts.
Most relationships are not contractual.
However, many relationships are repeated.
So this is going to be of more importance perhaps in general
life--though perhaps less so in business--more important in
general life than thinking about contracts.
So let's think about some obvious examples,
think about your own friendships.
I don't know if you have any friendships--I assume you
do--but for those of you who do your friendships are typically
not contractual. You don't have a contract that
says if you're nice to me, I'll be nice to you.
Similarly, think about interactions among nations.
Interactions among nations typically cannot be contractual
because there's no court to enforce those would-be
contracts, although you can have treaties
I suppose. But most interactions among
nations--cooperation among nations is sustained by the fact
that those relationships are going to go forever.
Even in business, even where we have contracts,
and even in a very litigious society like the U.S.
which is probably the most litigious society in the world,
we can't really rely on contracts for everyday business
relationships. So, in some sense,
we need a way to model a way to sustain cooperation and good
behavior that forms, if you like,
the social fabric of our society and prevents always
going to court about everything. Now, why might repeated
interaction work? Why do we think way back in day
one of the class that repeated interaction might be able to
enable us to behave well, even in situations like
Prisoner's Dilemma's or situations involving moral
hazard where bad behavior is going to occur in one shot
games? So the lesson we're going to be
sort of underlying things today and all week is this one.
In ongoing relationships the promise of future rewards and
the threat of future punishments may--let's be careful--may
sometimes provide incentives for good behavior today.
Just leave a gap here in your notes because we're going to
come back to this. So this is a very general idea.
And the idea is that future behavior in the relationship can
generate the possibility of future rewards and/or future
punishments, and those promises or threats
may sometimes provide incentives for people to behave well today.
The reason I want to leave a gap here is I want--part of the
purpose of this week's lectures is to try and get beyond this.
This is kind of, almost a platitude,
right? Most of you knew this already.
So I want to get beyond this. I want to see when is this
going to work? When is it not going to work?
How is it going to work? So I don't want people to leave
this week of classes or leave the course thinking:
oh well, we're going to interact more
than once so everything's fine. That's not true.
I want to make sure we understand when things work,
how they work, and more importantly,
when they don't work and how they don't work.
So we're going to try and fill in the gap that we just left on
that board as we go on today. Nevertheless,
we do have this very strong intuition that repeated
interaction will get us, as it were, out of the
Prisoner's Dilemma. So why don't we start with the
Prisoner's Dilemma. I'll put this up out of the way
and we'll come back to it. Let's just remind ourselves
what the Prisoner's Dilemma is because you guys are all full of
turkey and cranberry sauce and you've probably forgotten what
Game Theory is entirely. Let's name these strategies,
rather than alpha and beta, let's call them cooperation and
defect. And that will be our convention
this week. We'll call them cooperation and
defect. This is Player A and this is
Player B, and the payoffs are something like this (2,2),
(-1,3), (3, -1) and (0,0). It doesn't have to be exactly
this but this will do. This is the game we're going to
play. And, to try and see if we get
cooperation out of it by having repeated interaction,
we're going to play it more than once.
So let me go and find some players to play here.
This should be a familiar game to everybody here.
So why don't I pick some people who are kind of close to the
front row. So what's your name again?
Student: Brooke. Professor Ben Polak:
Brooke. Okay so Brooke is going to be
Player B. And I've forgotten your name,
by this stage I should know it. Patrick you're going to be
Player A. And the difference between
playing this game now and playing this game earlier on in
the class is we're going to play not once, but twice.
We're going to play it twice. So write down what you're going
to do the first time and show it to your neighbor.
Don't show it to each other. And let's find out what they
did the first time. So is it written down?
Something's written down? So Brooke.
Student: I cooperated. Professor Ben Polak:
You cooperated. Patrick?
Student: I defected. Professor Ben Polak:
Patrick defected, okay.
Okay let's play it a second time.
So write down what you're going to do the second time.
Brooke? Student: This time I'm
going to defect. Student: Me too.
Professor Ben Polak: All right,
so we had the play this time--let's just put it up
here--so when we played it this time, we had A and B.
And the first time we had (defect, cooperate) and the
second time we had (defect, defect).
Let's try another pair. We'll just play this a couple
times and we'll talk about it. So that's fair enough.
Why don't we go to your neighbors.
That's fair enough. It's easy. So you are?
Student: Ben. Professor Ben Polak:
That's a good name, very good okay.
You are? Student: Edwina. Professor Ben Polak:
Edwina, Edwina and Ben okay.
So we're going to make Ben Player B and Edwina Player A.
And why don't you write down what you're going to do for the
first time. Again, we're going to play it
twice. Why don't we mix it up,
we can play it three times. We'll play it three times this
time okay. We'll play it three times.
Both people are happy with their decisions.
Okay so the first time Edwina what did you choose?
Student: Defect. Student: Cooperate.
Professor Ben Polak: All right so we had--let's
put down this time--so we've got Edwina will Play A and Ben B.
And we had (cooperate, defect). Second time please, Edwina?
Student: Cooperate. Student: Defect.
Professor Ben Polak: Okay, so we're going to and
fro now. So this was cooperate and
defect and one more time: write it down.
Both players written down? Edwina?
Student: Cooperate. Student: Defect.
Professor Ben Polak: Okay, so you we flipped
round again okay. Okay, so we're seeing some
pretty odd behavior here. Who did what that time?
Edwina what did you do? Student: I cooperated.
Professor Ben Polak: So we had this.
Is that right? So keep the microphones for a
minute, and we'll just talk about it a second.
All right, so first of all let's start with Ben here.
Ben you were cooperating on the first go.
So why did you choose to cooperate the first turn?
Student: I felt that if I established a reputation for
cooperating we could end up in the cooperate,
cooperate. Professor Ben Polak:
All right, so you thought that by playing
cooperate early you could establish some kind of
reputation. And what about later on when
you played defect thereafter, what were you thinking there?
Student: I realized that she established a reputation for
defecting a second time. Professor Ben Polak:
All right, so you switched strategies
mid-course. Edwina you started off by
defecting. Why did you start off by
defecting? Shout it out so people can hear
you. Student: Because his
friend defected so I thought he might defect.
Professor Ben Polak: Okay, his friend defected.
Okay, so he's been tainted by his friend there.
There's a shortage of space in the class.
They could have just been sitting next to each other.
Thereafter you cooperated. Why was that?
Student: Because I thought he cooperated.
Maybe he was going to keep cooperating.
Professor Ben Polak: All right,
so in fact your reputation works in some sense.
By cooperating early you convinced Edwina you would
cooperate. And then you went on
cooperating even after he defected, so what were you doing
in the third round? Shout out.
Student: I thought he might cooperate because I
cooperated. Professor Ben Polak:
All right, he might come back.
Let's talk about it to your neighbors.
So Brooke, shout out why you cooperated in the first round.
Student: Because I was hopeful that he would cooperate.
Professor Ben Polak: You were hoping he would
cooperate all right, and why did you defect
thereafter? Student: Because I
thought he would continue to defect after he defected.
Professor Ben Polak: Because he defected,
he would continue to defect. Patrick, you're the person who
just defected throughout here. Grab the mic that's next to you.
Why did you just defect? Student: It's such a
short game that it makes sense to defect in the last period so
the second last period and the first period.
Professor Ben Polak: All right,
that's an interesting idea. So Patrick's saying,
actually if we look at the last period of this game,
if we look at this last period of the game,
what does the game look like in the last period?
Student: It's a single period game.
Professor Ben Polak: In the last period,
this actually is the game. If I drew out the game with two
periods, it would be kind of a hard thing to draw,
it would be an annoying diagram to draw.
But in the last period of the game, whatever happened in the
first period is what? It's sunk, is that right?
Everything that happened in the first period is sunk.
So in the last period of the game these are the only relevant
payoffs, is that right? Since these are the only
relevant payoffs looking forward, in the last period of
the game, we know that there's actually a dominant strategy.
And what is that dominant strategy in the last period of
the game? To do what?
In Prisoner's Dilemma, what's the dominant strategy?
Shout it out. Defect, okay.
So what we should see in this game--we didn't actually because
we had some kindness over here from Edwina but what we should
see in general--is we know that in the last period of the game,
in period two, we're going to get both people
defecting. The reason we're going to get
both people defecting is because the last period of the game is
just a one shot game. There's nothing particularly
exciting about it. There is no tomorrow,
and so people are going to defect.
But now let's go back and revisit some of the arguments
that Edwina, Brooke, and--I've forgotten what your
neighbor is called again, Ben, (I should remember
that)--and Ben said earlier. They gave quite elaborate
reasons for cooperating: cooperating to establish
reputation; cooperating because the other
person might cooperate; whatever.
But most of these behaviors were designed to either induce
or promise cooperation in period two, is that right?
What we've just argued is that in period two everyone's going
to defect. Period two is just a trivial
one stage Prisoner's Dilemma. We actually analyzed it in the
very first week of the class. And provided we believe these
payoffs, we're done. Period two people are going to
defect. Since they're going to defect
in period two, nothing I can do in period one
is going to affect that behavior and therefore I should defect
also in period one. In order to belabor this point,
we can actually draw up what the matrix looks like in period
one--so let's do that--using the style we did last week--two
weeks ago--before we went away. So here once again is the
matrix we had before, and I want to analyze the first
stage game. In the first stage game,
what I'm going to do is I'm going to add in the payoffs I'm
going to get from tomorrow. The payoffs I'm going to get
from tomorrow are from tomorrow's equilibrium.
Well this isn't going to do very much for me as we'll see
because I'll get 2 + 0 tomorrow because we know I'm playing
defect tomorrow, 2 + 0 tomorrow,
- 1 + 0 tomorrow, 3 + 0 tomorrow,
3 + 0 tomorrow, - 1 + 0 tomorrow,
and 0 + 0 tomorrow, 0 + 0 tomorrow.
So just as we did with the war of attrition game two weeks ago,
we can put in the payoffs from tomorrow,
we can roll back those equilibrium payoffs to today,
it's just in this particular exercise it's rather a boring
thing, because we're just adding 0 to
everything. When I add 0 to everything,
I then just cancel out the zeros and I'm back where I
started, and of course I should defect.
So what I'm going to see is because I'm going to defect
anyway tomorrow, today is just like a one shot
game as well. And I'm going to get defect
again. Now here we played the game
twice and got defect, defect, what about if we played
the game three times? It's the same thing.
We didn't play three times but we did play the game three times
between Edwina and Ben. There we know we're going to
defect in the third round. Therefore we may as well defect
in the second to last round. Therefore we may as well defect
in the first round. If we played it five times,
we know we're going to all defect in the fifth round.
Therefore we may as well defect in the fourth round.
Therefore we may as well defect in the third,
and so on. If we played it 500 times,
we wouldn't have time in the class, but if we played it 500
times, we know in that 500th period
it's a one shot game and people are going to defect.
And therefore, in the 499th period people are
going to defect. And therefore in the 498th
period people are going to defect, and so on.
So the problem here is that we get unraveling,
something we've seen before in this class, we get unraveling
from the back. I have a worry that there might
only be one L in unraveling in America, is that right?
How many L's do we put in unraveling in America?
One, I've just come back from England and my spelling is
somewhere in the mid-Atlantic right now.
I'll leave it as one. All right: unraveling from the
back. Essentially this is a backward
induction argument, only instead of using backward
induction we're really using sub-game perfection.
We're looking at the equilibria in the last games and as we roll
back up the game, we get unraveling.
So here's bad news. The bad news is,
we'd hoped that by having repeated interaction in the
Prisoners' Dilemma, we would be able to sustain
cooperation. That's been our hope since day
one of the class. In fact, we stated it kind of
confidently in the first day of the class, and we kind of
intuitively believe it. But what we're discovering is,
even if you played this game for 500 times and then stopped,
you wouldn't be able to sustain cooperation in equilibrium
because we're going to get unraveling in the last stage and
so on and so forth. So it seems like our big hope
that repeated interaction would induce cooperation in society is
going down the plug hole. That's bad.
So let's come back and modify our lesson a little bit.
So what went wrong here was, in the last period of the game,
there was no incentives generated by the future,
so there was no promise of future rewards or future
punishments, and therefore cooperation broke down and then
we had unraveling. So the lesson here is what?
The lesson is: but for this to work it helps
to have a future.
This whole idea of repeated interaction was the future was
going to create incentives for the present.
But if the games come to an end, there's going to be some
point when there isn't a future anymore and then we get
unraveling. Now this is not just a formal
technical point to be made in the ivory tower of Yale.
This is a true idea. So for example,
if we think about CEO's or presidents, or managers of
sports teams, there's a term we use,
there's a word we use--at least in the states--to describe such
leaders when they're getting towards the end of their term
and everyone knows it. What's the expression we use?
"Lame duck." So we have this lame duck
effect. The lame duck effect at the end
of somebody's term undermines their ability to cooperate,
their ability to provide incentives for people to
cooperate with them, and causes a problem.
So this lame duck effect affects presidents but it also
affects CEO's of companies. But it's not just leaders who
run into this problem. So if you have an employee,
if you're employing somebody, you may have a contract with
the person you're employing, but basically you're sustaining
cooperation with this person because you interact with them
often. You know you're always going to
interact with them. But then this employee
approaches retirement. Everyone knows that in April or
something they're going to retire, then the future can't
provide incentives anymore. And you have to switch over
from the implicit incentive of knowing you're going to be
interacting in the future, to an explicit incentive of
putting incentive clauses in the contract.
So retirement can cause, if you like,
a lame duck effect.
This is even true in personal relationships.
In your personal relationships with your friends,
if you think that those friendships are going to go on
for a long time, be they with your significant
other or just with the people you hang out with,
you're likely to get a lot of cooperation.
But if, as with perhaps most economics majors,
most of your significant others are only going to last for a day
at most, you're not going to get great
cooperation. You're going to get cheating.
No one's rising to that one but I guess it's true.
So what do we call these: "economics majors'
relationships."
These are kind of "end effects." All of these things are caused
by the fact that the relationship is coming to an
end. And once the relationship is
coming to an end, all those threats and promises
of future behavior, implicit or otherwise,
are going to basically disappear.
So at this point we might think the following.
You might conclude the following.
You might conclude that if a relationship has a known end,
if everyone knows the relationship's going to end at a
certain time, then we're done and we
basically can't sustain cooperation through repeated
interaction. That's kind of what the example
we looked at seems to suggest. However, that's not quite true.
So let's look at another example where a relationship is
going to have a known end but nevertheless we are able to
sustain some cooperation. And we'll see how.
So again, I'm being careful here, I've said it helps
to have a future. I haven't said it's
necessary to have a future.
So that's good news for the Economics majors again.
So let's do this example to illustrate that,
even a finite interaction, even an interaction that's
going to end and everyone knows it's going to end,
might still have some hope for cooperation.
So look at this slightly more complicated game here.
And this game has three strategies and we'll call them
A, B, and C for each player. The payoffs are as follows
(4,4), (0,5), (0,0), down here we'll do
(0,0), (0,0) and (3,3), and in the middle row (5,0),
(1,1), (0,0).
We're going to assume that this game, just like we did with the
first time we did Prisoner's Dilemma, this game is going to
be played twice. It's going to be repeated,
it's going to be played twice, repeated once.
So let's just make sure we understand what the point of
this game is. In this game,
in the one shot game I hope it's clear that (A,
A) is kind of the cooperative thing to do.
We'd like to sustain play of (A, A), because then both
players get 4 and that looks pretty good for everybody.
However, in the one shot game (A, A) is not a Nash
equilibrium. Why is (A, A) not a Nash
equilibrium? Let me grab those mikes again.
Why is (A, A) not a Nash equilibrium?
Anybody? I'm even getting to know the
names at this stage of the term. This is Katie, right?
So shout out. Student: The best
response to the other guy playing A is playing B.
Professor Ben Polak: Good, so if I think the
other person's going to play A, I'm going to want to defect and
play B and obtain a gain, a gain of 1.
So basically I'll get 5 rather than 4.
I'll defect to playing B and get 5 rather than 4 for a gain
of 1, is that right? So (A, A) is not a Nash
equilibrium in the one shot game--we're sometimes going to
call it--it's fine--in the one shot game.
So now imagine we play this game twice.
Instead of just playing once, we're going to play this game
two times. I'll come back to that.
Before I do that what are the pure strategy Nash equilibria in
this game? Anybody?
So the Nash equilibria in this one shot game are (B,
B) and (C, C). There's some mixed ones as well
but this will do. So (B, B) and (C,
C) are the pure strategy Nash equilibria.
Now consider playing this game twice.
And last time we looked at a game played twice,
it was Prisoner's Dilemma, and we noticed that we couldn't
sustain cooperation because in the last stage people weren't
going to cooperate and hence in the first stage people weren't
going to cooperate. But let's look what happens
here. If this game is played twice is
there any hope of sustaining cooperation, i.e.
A, in both stages? Could we have people play A in
the first stage and then play A again in the second stage?
So Patrick's shaking his head. So that's right to shake his
head. So let me grab the other mike.
So why is that not going to work?
Why can't we get people to cooperate and play A in both
periods? Shout out.
Student: In the second period you're still going to
defect and play B. Professor Ben Polak:
Good, so in the second period, exactly the argument
that Katie produced just now in the one shot game applies
because the second period game is a one shot game.
So we've got no hope of sustaining cooperation in both
periods. Let's call this cooperation.
We can't sustain (A, A) in period two,
in the second period. However, I claim that we may be
able to get people to cooperate in the first period of the game.
Now how are we going to do that? So to see that let's consider
the following strategy. But consider the strategy--the
strategy's going to be--play A and then play C if (A,
A) was played; and play B otherwise.
So this strategy is an instruction telling the player
how to play. Now, before we consider whether
this is an equilibrium or not, let's just check that this
actually is a strategy. So what does a strategy have to
do? It has to tell me what I should
do--it should give me an instruction--at each of my
information sets. In this two period game,
each of us, each of the players in the game has two information
sets. They have an information set at
the beginning of the game and they have another information
set at the beginning of period two.
Is that right? So it has to tell you what to
do at the first information set and at the second information
set and it does. It says play A at the first
one, and then at the beginning of period two--I said there's
only one information set there, but actually there's nine
possible information sets depending on what happened in
the first period. So each thing that happened in
the first period is associated with a different information
set. I always know what happened in
the first period. And at each of those nine
information sets it tells me what to do at the beginning of
period two. In particular,
it says if it turns out that (A, A) was played then play C
now. And otherwise,
all the other eight possible information sets I could find
myself in, play B. So this is a strategy.
Now of course the big question is, is this strategy an
equilibrium, and in particular is it a sub-game perfect
equilibrium?
Let me a bit more precise: if both players were playing
this strategy would that be a sub-game perfect equilibrium?
Well let's have a look.
(Of course I don't see it now so let's pull both these boards
down.) So to check whether this is a sub-game perfect
equilibrium, we're going to have to check
what? We're going to have to check
that it induces Nash behavior in each sub-game.
(I think the battery is going on that.
Shall I get rid of that. Okay, I'm going to shout.
Can people still hear me? People in the balcony can they
hear me? Yup, okay.) So we're going to
have to see if we can sustain Nash behavior in each sub-game.
So let's start with the sub-games associated with the
second period. Technically,
there are nine such sub-games depending on what happened in
the past, depending on what happened in the first period.
There's a sub-game following (A,A).
There's a sub-game following (A,B).
There's a sub-game following (A,C), and so on.
So for each activity in the first period,
for each profile in the first period there's a sub-game.
However, it doesn't really matter to distinguish all of
these sub-games particularly carefully here,
since the costs from the past, what happened in the past is
sunk, so we'll just look at them as a whole.
So in period two after (A,A)--so you have one in
particular of those nine sub-games--after (A,A),
this strategy induces (C, C).
If people play A, if both people play A in the
first period, then in the sub-game following
people are supposed to play (C, C).
Is that a Nash equilibrium of the sub-game?
Well was (C,C) a Nash equilibrium?
Yeah, it's one of our Nash equilibria.
Let's just look up there, we've got it listed.
There it is. So we're playing this Nash
equilibrium. So that is a Nash equilibrium,
so we're okay. After the other choices in
period one, this strategy induces (B, B).
That's good news too because (B, B), we already agreed,
was a Nash equilibrium in the one shot game.
So in all of those nine sub-games, the one after (A,A),
and the eight after everything else, we're playing Nash
behavior, so that's good. What about in the whole game?
In the whole game, starting from period one,
we have to ask do you do better to play the strategy as
designated, in particular to choose A,
or would you do better to defect?
Well let's have a look. So if I choose A--remember the
other person is playing this strategy--so if I choose A then
my payoff in this period comes from (A,A) and is 4.
If I choose A then we're both playing A in this period and I
get 4. Tomorrow, according to this
strategy--tomorrow since we both played A--both of us will now
play C. Since we're both playing C,
I'll get an additional payoff of 3.
So tomorrow (C, C) will occur and I'll get 3
for a total of 7. What about if I defect?
We could consider lots of possible defections,
but let's just consider the obvious defection.
You can check the other ones at home.
So if I defect and choose B now, then, in this period,
I will be playing B and my opponent or my pair will be
playing A. So in this period I will get 5.
And tomorrow, since (A,A) did not occur,
both of us will play B and get a continuation payoff of 1.
So the continuation payoff this time will be following from (B,
B), and I'll get 1. Why don't I do what I've been
doing before in this class and put boxes around the
continuation payoffs, just to indicate that they are,
in fact, continuation payoffs. So if I play A,
I get 4 now and a continuation payoff of 3, for a total of 7.
If I play B now, I gain something now,
I get 5 now, but tomorrow I'll only get 1
for a total of 6.
So in fact, 7 is bigger than 6, so I'm okay and I won't want to
do this defection.
I just want to write this one other way because it's going to
be useful for later. So one other way to write this,
I think we've convinced ourselves that this is an
equilibrium, but one other way to write this
is, and it's a more general way in repeated games,
is to write it explicitly comparing the temptations to
cheat today with the rewards and punishments from tomorrow.
So what we want to do is, in general, we can just rewrite
this as checking that the temptation to cheat or defect
today is smaller than the value of the reward minus the value of
the punishment. But the key words here are:
defecting occurs today; rewards and punishments occur
tomorrow. If we just rewrite it this way,
we'll see exactly the same thing, just rearranging
slightly. The temptation to defect today
is I get 5 rather than 4, or if you like a gain of 1.
And the value of the reward tomorrow--the reward was to play
(C, C) tomorrow and get 3. The value of the punishment
tomorrow was to play (B, B) tomorrow and get 1,
and that difference is 2. So here the fact that the
temptation is outweighed by the difference between the value of
the reward and the value of the punishment is what enabled us to
sustain cooperation. I'm just writing that in a more
general way because this is the way that we can apply in games
from here on. We're going to compare
temptations to cheat with tomorrow's promises.
Patrick, let me get you a mike. Student: I don't
understand why it's reasonable to think you would play (B,
B) in the second period though. In the second period you have a
temptation to play C, C even if the person defected
on you. Professor Ben Polak:
Good, that's a very good point.
So what Patrick's saying is, it's all very well to say we're
sustaining cooperation in the first period here,
but the way in which we sustained cooperation was by
going along with, as it were,
the punishment tomorrow. It required me,
tomorrow, to go along with the strategy of choosing B if I
cheated in the first period. I want to answer this twice,
once disagreeing with him and once agreeing with him.
So let me just disagree with him first.
So notice tomorrow, if the other person,
the other player is going to play B then I'm going to want to
play B. So the key idea here is,
as always in Nash equilibrium, if I take the other person's
play as given and just look at my own behavior--if I think the
other person is playing this strategy and hence he's going to
play B tomorrow after I've cheated--then I want to be play
B myself. So that check is just our
standard check, and actually that's the check
that makes sure that it really is a sub-game perfect
equilibrium. We're not putting some
punishments down the tree that are arising out of equilibrium.
It has to be I want to do tomorrow what I'm told to do
tomorrow. So that idea seems right and
I'm glad Patrick raised it because that was the next thing
in my notes. I want to go along with this
punishment because if the other person's playing B I want to
play B myself. Nevertheless,
I think Patrick's onto something and let me come back
to it in a minute. What I want to do before I do
that is just draw out a general lesson from this game.
The general lesson is we can sustain cooperation even in a
finitely repeated game, but to do so we need there to
be more than one Nash equilibrium in the stage game.
What we need there to be is several Nash equilibria,
one at least of which we can use as a reward and another one
which we can use as a punishment.
So even if a game is only played a finite number of times,
if there are several equilibria in the stage game,
both (B,B) and (C,C), we can use one of them as a
reward and the other one as a punishment,
and use that difference to try and get people to resist
temptations today. So that's the general idea
here, and let's just write that down.
Patrick don't let me get away with not coming back to your
point, I want to come back to it in a second.
So the lesson here is, if a stage game--a stage game
is the game that's going to be repeated--if a stage game has
more than one Nash equilibrium in it,
then we may be able to use the prospect of playing different
equilibria tomorrow to provide incentives--and we could think
of these incentives as rewards and punishments--for cooperation
today. In the game we just saw,
there were exactly two pure strategy Nash equilibria in the
sub-game. We used one of them as a reward
and the other one as a punishment, and we were able to
sustain cooperation in a sub-game perfect equilibrium.
Now, a question arises here, and I think it's behind
Patrick's question, and that is how plausible is
this? How plausible is this?
Formally, if we write down the game and do the math,
this comes out. But how plausible is this as a
model of what's going on in society?
I think the worry--I'm guessing this is worry that was behind
Patrick's question--is this. Suppose I'm playing this game
with Patrick and suppose Patrick cheats on me the first period,
so Patrick chooses B while I wanted him to choose A in the
first period. Now in the second period,
according to the equilibrium instructions,
we're supposed to play (B, B) and get payoffs of 1 rather
than (C,C) and get payoffs of 3. So let's make that visible
again.
But suppose Patrick comes to me in the meantime.
So between period one and period two, Patrick shows up at
my office hours and he says: yeah,
I know I cheated on you yesterday, but why should we
punish ourselves today? Why should we,
both of us, lose today by playing the (B,B) equilibrium?
Why don't we both switch to the (C,C) equilibrium?
After all, that's better for both of us.
Patrick's saying to me, it's true that I cheated you
yesterday, but "let bygones be bygones,"
or "why cry over spilt milk," or he'll use some other saying
plucked out of the book of platitudes,
and say to me: well why go along with the
punishment. Let's just play the good
equilibrium now. And, if I look at things and I
say well, actually, it's true I got nothing in the
first period because Patrick kind of cheated me in the first
period--so it's true I got nothing yesterday--and it's true
it was Patrick who caused me to get nothing yesterday,
but nevertheless that's a sunk cost and I'm comparing getting 1
now with getting 3 now. Why don't we just go along and
get 3? Moreover, I'm not in danger of
being cheated again because if Patrick believes I'm going to
play C, he's going to play C too.
So that kind of argument involves what?
It involves some kind of communication between stages,
but it sounds like that's going to be a problem.
Why? Well, suppose it's the case
that we are going to get communication between periods
and suppose it's the case that someone with the gift of the
gab, someone on his way to law
school like Patrick, is going to be able to persuade
me to go back to the good equilibrium for everybody in
period two, then we know we're going to
play the good equilibrium in period two and now we've lost
any incentive to cooperate in period one.
The only reason I was willing to cooperate in period one was
because the temptation to defect was outweighed by the difference
between the value of the reward and the value of the punishment.
If we're going to get the reward anyway,
I'll go ahead and defect today. So the problem here is this
notion of "renegotiation," this notion of communicating between
periods can undermine this kind of equilibrium.
There's a problem that arises if we have renegotiation.
So there may be a problem of renegotiation.
Now, this probably may not be such a big problem.
For example, it may be, say,
I'll be so angry at Patrick because he screwed me over in
period one that I won't go along with the renegotiation.
It may also be the case, and we'll see some examples of
this on the homework assignment, that the many equilibria in the
second stage of the game are not such that a punishment for
Patrick is also a punishment for me.
What really caused the problem here was, in trying to punish
Patrick, I had to punish myself. But you could imagine games,
or see some concrete examples on the next homework assignment,
in which punishing Patrick is rather fun for me,
and punishing me is rather fun for Patrick,
and that's going to be much harder to renegotiate our way
out of. There was a question,
let me get a mike out to the question.
Yeah? Student: If we're ruling
out renegotiation, can't we devise a strategy for
Prisoner's Dilemma as well even though it doesn't have multiple
Nash equilibriums? Professor Ben Polak:
Yeah, okay good, so the issue there is,
in Prisoner's Dilemma, we established in the first
week that if we're not allowed to make side payments,
we're not allowed to bring in outside contracts,
then no amount of communication is going to help us.
So you're right if we can rely on the courts or the mafia to
enforce the contracts that would be fine and then communication
would have bite. But you remember way back in
the first week when we tried to talk our way out of bad behavior
in the Prisoner's Dilemma it didn't help precisely because
it's a dominant strategy. Whereas, here,
Patrick's conversation, Patrick's verbal agreement to
play the other equilibrium is an agreement to play a Nash
equilibrium. That's what is getting us into
trouble. So what may help us here,
what may avoid renegotiation is simply I'm not going to go along
with that renegotiation--I'm too angry about having been cheated
on--and it may be for other reasons it may actually be that
I enjoy the punishment. Nevertheless,
this is a real problem in society and I don't think we
should pretend that this problem isn't there.
So a good example is in bankruptcy, which is one of
those words I can never spell. It seems I have too many
consonants in it, is that right?
It's approximately right anyway.
So bankruptcy law in the U.S. for the last 200 odd years has
gone through cycles. One way to view these cycles
is, they're cycles of relaxing the law and making life "easier
for borrowers" and then tightening up again.
This is not a recent phenomenon, this is not only a
recent phenomenon, this occurred throughout the
nineteenth century. So what typically happened was
there was either explicit renegotiation between parties or
renegotiation through act of Congress or sometimes through
the acts of the states, in which bankrupt debtors were
basically let off or given easier terms.
The argument was always the same.
These people are not going to pay back now.
It's clear from the nineteenth century, often if you were
bankrupt you were in jail, actually worse than that.
Sometimes in the nineteenth century in England not only if
you were bankrupt were you in jail but your creditors were
having to pay the fees to feed you in jail.
So there you were sitting in jail, you weren't paying that
money back to your creditor, and you're actually costing
money to your creditor by being in jail.
This seems like a situation that you want to renegotiate
your way out of. You say, hey let's let these
guys out of jail. Let them be productive again,
and then they'll pay back part of the loans.
So you had these waves of bankruptcy reform in which the
debtors' prisons were closed down, people were let out,
people were relieved of debt. What's the problem with doing
that? That seems like a good idea
right. After all, you don't want all
these people bankrupt, in debt, not paying money back
to their creditors anyway. That doesn't seem like a good
situation in society. It seems like a renegotiation
that's a win-win situation: it's better for everybody.
What's the problem with it though?
Let's get a mike down here. What's the problem with this?
Student: It incentivizes bankruptcy.
Professor Ben Polak: Right, it creates an
incentive for people not to repay in the first place.
It creates an incentive for people to take big risks now,
and hence, it makes bankruptcy, if you like,
or makes non-repayment of debt more likely.
So this has been going on for a while, but you see it very much
today if you read the financial pages of the papers in the last
few weeks. There's a big worry in the U.S.
right now about people failing to repay what kind of debt?
What kind of debt is the big worry about?
Mortgage debt, right, so those people who are
house owners failing to pay back mortgage debt and,
equally worrying, financial institutions that
have a lent a lot of, for example,
sub-prime debt now find themselves in financial trouble.
You're going to read a lot in the papers about not letting
people out lightly out of those situations of being in debt,
or not letting people out lightly out of bankruptcy.
The term you're going to hear is "bail out."
So bail out--the argument you're going to read is,
you don't want the government or the central bank bailing out
those financial institutions who have apparently taken too large
risks on sub-prime mortgage debts,
even though we all agree it's better right now for those
financial institutions not to go under.
Why are we not going to--Even though it's better for everybody
for it not to go under, why are we not going to bail
them out? Because it undermines the
incentives for them not to make bad loans to start with.
To a lesser extent you're going to hear that on the debtor side
as well. You're going to hear some
people say we shouldn't be bailing out people who took on
bad loans, took on bad mortgages to
finance their houses, again for bail out reasons.
So this is an important trade off.
If you go on to law school, you're going to see a lot about
this kind of discussion, and this is the discussion of
trading off ex-ante efficiency and ex-post efficiency.
Sometimes, as Patrick has pointed out in the game just
now, the ex-post efficient thing to do is to go back to the good
equilibrium, or if you like to bail out
these firms who've made bad loans.
However, from an ex-ante point of view, it creates bad
incentives for people to make those loans in the first place;
and, in the ex-ante point of view, it created the incentive
for people to defect in the first period of that game.
So this theme of ex-ante versus ex-post efficiency is not one
we're going to go into anymore in this class,
but it should be there in the back of your minds when you all
end up in law school in a few years time.
Okay, so, so far what have we done?
We've been looking at repeated interaction and seeing if it can
sustain cooperation. The first thing we learned was
that if the repeated interaction is a finite interaction,
if we know when it's going to end--we know when the
interaction's going to end--then sustaining cooperation is going
to be hard because in the last period there will be an
incentive to defect. We saw we could get around that
to some extent if games have multiple equilibria,
but in a game like Prisoner's Dilemma, we're really in
trouble. Things will unravel from the
back. So now let's mix things up a
little bit by looking at a more complicated variety of repeated
interactions. Rather than just play the game
once or twice, or three times,
let's play the game under the following rules.
We'll go back to our same players, how many mikes are
still out here? I took them both back,
is that right? I'm taking both the green and
the blue mike, and I'm giving them back to our
players. So this is to Brooke and this
is to Patrick. And we're going to have Brooke
and Patrick play Prisoner's Dilemma again.
I'm hoping I haven't deleted it. Maybe I did.
It doesn't matter we know the payoffs.
We're going to have them play Prisoner's Dilemma again,
but this time, in between every play of the
game, I'm going to toss a coin. Actually I'll toss the coin
twice and if that coin comes up heads both times then the game
will end, but otherwise they'll play again.
So everyone understand what we're going to do?
We're going to play Prisoner's Dilemma.
At the end of every period I'll toss a coin twice.
I might get Jake to toss it. Jake will toss a coin twice.
If it comes up heads both times the game's over but otherwise
the game continues. So both Brooke and Patrick
should get ready to play, and the payoffs of this game
are just what we had before. So let's just remind ourselves
what the payoffs of that game are.
So we've got cooperate, defect, cooperate,
defect, (2,2), (-1,3), (3, -1) and (0,0).
And we'll keep score here: so this Brooke and Patrick.
So, putting pressure on these guys, let's write down what
you're going to do the first time.
Brooke? Student: Defect. Professor Ben Polak:
Patrick? Student: Cooperate.
Professor Ben Polak: All right.
I think we're getting some payback from earlier,
right. Round two.
Student: Are you going to toss the coin?
Professor Ben Polak: Oh I have to toss the coin,
you're absolutely right, thank you.
Now I have to find a coin. Look at that, thank you Ale.
Twice: toss it twice.
Heads, heads again, so the game is over.
That didn't last long. Just for the sake of the class,
let's pretend that it came up tails.
Okay we'll cheat a little bit. Okay, so we're playing a second
time--just with a little bit of cheating.
I need someone else, someone less honest to toss the
coin. Brooke what do you choose?
Student: Oh I'm defecting.
Professor Ben Polak: Defecting again,
Patrick? Student: Cooperate.
Professor Ben Polak: Cooperate,
Patrick seems very trusting here, all right let's toss the
coin a third time. All right, Brooke?
Student: I'm going to defect again.
Student: Defect. Professor Ben Polak:
All right, heads, heads,
so this time we'll end it. So what happened this time,
let's just talk about it a bit. So Brooke and Patrick were
playing, Patrick cooperated a bit in the beginning,
Brooke's defected throughout. Brooke why did you defect?
Shout out so everyone can hear you.
Why did you defect right from the start of the game?
Student: Because last time it didn't work so well
cooperating. Professor Ben Polak:
Last time it didn't work so well, okay.
Fair enough but even after Patrick was sort of cooperating
you went on defecting. So why then?
Student: Because I wanted to get the higher payoff,
I thought either he would continue cooperating and I could
defect, Professor Ben Polak:
All right, if he had gone on cooperating,
which in fact he did. Patrick why were you
cooperating early on here? Shout out so people can hear
you. Student: So with a two
head rule, like you have a 75% chance at having another game.
So with those payoffs, even one period the payoff of
cooperating twice is the same as defecting once,
so it's better if you can continue cooperating,
and the percentage is high enough that it would make sense
to do so. Professor Ben Polak:
All right, if you figure there's a good
enough chance of getting--even after Brooke's defected the
first period you went on cooperating,
but then after the second period you gave up and started
defecting. If it had gone on to the fourth
period what would you have done? Student: Defected.
Professor Ben Polak: You would have defected
again, all right. Fifth period?
Student: Well if she kept defecting,
I would keep defecting. Professor Ben Polak:
All right, so what Patrick's saying is he
started off cooperating but once he saw that Brooke was
defecting, he was going to switch to
defect. And basically as long as she
went on defecting, he was going to stick with
defecting. Let's try a different pair.
So why don't we switch it over to your partners there.
So Ben here and Edwina.
So why don't you stand up. I want everybody to see these
people. So stand up a second.
So these are our players, I want people at the back to
essentially know who are playing, this is Edwina and this
is Ben. So Edwina--sit down so you can
actually write things down. So Edwina and Ben,
Edwina, have you both written down a strategy?
Ben, have you written down a strategy?
Edwina what did you choose? Student: Cooperate.
Professor Ben Polak: So Edwina's cooperating,
Ben? Student: Cooperate.
Professor Ben Polak: Okay, let's toss the coin.
So we're okay, so we're still playing.
Edwina? Student: Cooperate.
Professor Ben Polak: Ben?
Student: I chose cooperate.
Professor Ben Polak: All right,
so they're cooperating. Tails again,
so you're still playing. Student: Cooperate.
Student: Cooperate. Professor Ben Polak:
All right, so they're still cooperating.
Some pain in the voice this time.
Heads and then tails, write down what you're going to
do. Edwina? Student: Defect.
Professor Ben Polak: Ben.
Student: Cooperate. Professor Ben Polak:
Things were going so nicely there.
We had such a nice class going on there–.
All right, so we're still playing.
Edwina? Student: Defect. Professor Ben Polak:
Ben? Student: Defect.
Professor Ben Polak: All right,
Jake? Tails, tails, we're still going.
Student: Defect. Student: Defect.
Professor Ben Polak: All right,
let me stop it there, we'll pretend that we had two
heads. So let's talk about this.
We had some cooperation going on here, both people started
cooperating. So Ben, why did you cooperate
at the beginning? Student: Well,
going along with Patrick's reasoning I felt that if we
could have the cooperate, cooperate in the long term with
the 75% chance of continuing playing, that it would be a
worthwhile investment. Professor Ben Polak:
All right. Student: Until I
realized that Edwina had started defecting.
Professor Ben Polak: Let's come back a second.
Let's get you guys to stand up so people can hear you.
When you stand up you shout more.
So stand up again. Edwina, so you also started
cooperating, why did you start cooperating?
Student: For the same reason.
Professor Ben Polak: Same reason,
okay. So the key thing here is why
did you start defecting? You heard the big sigh in the
class. Why did you start defecting at
this stage? Student: Because we'd
had so many, I mean the coin toss had to come to heads,
heads sometime, so I started thinking that
maybe- Professor Ben Polak: The reversion to the mean
of the coin. Student: Yeah,
I just thought that it. I thought, I mean, I don't know.
Professor Ben Polak: So what did I say about the
relationships of Economic majors that are in the class?
Anyway, all right, so Edwina defected and then Ben
you switched after, why did you switch?
Student: Because once Edwina started defecting I felt
that we'd revert back to the defect, defect equilibrium.
Professor Ben Polak: All right,
so thank you guys. So there's another good
strategy here. People started off cooperating
and I claim that at least Ben--Ben can contradict me in a
second--but I think Ben's strategy here was something like
this. I'm going to cooperate and I'm
going to go on cooperating as long as we're cooperating.
But if at some point Edwina defects--or for that matter I
defect--then this relationship's over and we're going to play
defect forever. Is that right?
That's kind of a rough description of your strategy?
Edwina was more or less playing the same thing.
In fact it was her who defected, but once she defected
she realized that it was over and she went on defecting.
So this strategy has a name. Let's just be clear what the
strategy is. This strategy says play C which
is cooperate, and then play C if no one has
played D; and play D otherwise.
So start off by cooperating. Keep cooperating as long as
nobody's cheated. But if somebody cheats,
this relationship's over: we're just going to defect
forever. Now this strategy is a famous
strategy. It has a name.
Anyone know what the name is? This is called the "Grim
Trigger Strategy." So this strategy again,
it says we're going to cooperate, but if that
cooperation breaks down ever, even if it's me who breaks it
down, then I'm just going to defect forever.
Now, we're going to come back next time to see if this is an
equilibrium, but there's a few things to do first.
First let's just check that it actually is a strategy.
What does it mean to be a strategy again?
It has to tell us what to do at every information set I could
find myself at. And this game is potentially
infinite, so potentially there's an infinite number of
information sets I could reach. So you might think that writing
down a strategy that gives me an instruction at every single
information set is going to be incredibly complicated once we
go to games that are potentially infinite,
because there needs to be an infinite number of instructions.
But it turns out, actually it's possible to write
down such strategies rather simply, at least if they're
simple strategies. This example is one.
This tells me what to do at the first information set,
it says play C. It then tells me for every
information set I find myself at, in which only cooperation
has ever occurred in the history of the game,
I'm going to go on cooperating: play C.
And it says for all other histories, for all other
information sets I might find myself at, play D.
So it really is a strategy. Now this is very different
behavior--we played with the same players--this kind of
behavior is very different, in both games actually,
is very different than the behavior we saw in the game that
ended, the game with two periods or
three periods. What is it essentially that
made this different? What's different about this way
of playing Prisoner's Dilemma, where we had Jake toss the coin
versus the way we played before and we just played for five
periods and then stopped? What's different about it?
Somebody? Let's talk to our players,
Patrick why is this different? Student: We don't when
the game is going to end or if it's going to end,
so there's no last period. Professor Ben Polak:
Good, so our analysis of the game before,
the analysis of the Prisoner's Dilemma when we knew it was
going to end after two periods, after five periods,
whatever it was, was we all knew it was going to
end. There was a clearly defined
last period. When people are going to
retire, we know the month in which they're going to retire.
When President's are going to step down, we know they're going
to step down that period. When CEO's are going to go,
we know they're going to go--or acctually we don't always know
they're going to go but let's just pretend we do.
So what's different about this game is, every time we play the
game, there is a probability, in this case a .75 probability
that the game is going to continue to the next period.
Every time we play the game, with probability of .75 there's
going to be a future. There's no obvious last period
from which we can unravel the game in the way we did before.
Just to remind ourselves, the way in which
cooperation--our analysis of cooperation--broke down in the
finitely repeated Prisoner's Dilemma,
was when we looked at the last period, we know people are going
to defect. And once that thread is loose
we can unravel it all the way back to the beginning.
But here, since there is no last period that unraveling
argument never gets to hold. Now instead we're able to see
strategies emerge like the Grim Trigger Strategy,
and notice that the Grim Trigger Strategy has a pretty
good chance of actually sustaining cooperation.
So in particular, as long as people play this
strategy they are cooperating. It turns out that Edwina
eventually gave up that strategy, but had she gone on
playing it, they would have gone on cooperating forever.
But of course there's a question here,
and the question is: is this in fact an equilibrium?
We know that if people play this way, we get cooperation,
but the question--the thousand dollar question or whatever--is:
is this an equilibrium? So what do we have to do check
whether this is an equilibrium or not?
We have to mimic the argument we had before.
We have to compare the temptation to defect today and
compare that with the value of the reward (to cooperating) and
the value of the punishment (from defecting) tomorrow.
So this basic idea is going to re-emerge.
Having said that, let me now delete it so I have
some room.
To show this is an equilibrium, we need to show that the
temptation to defect--the temptation to cheat in the short
term--is outweighed by the difference between the value of
the reward and the value of the punishment.
All right, so let's set that up.
Let's put the temptation here first.
So the temptation in Prisoner's Dilemma, the temptation to cheat
today is what? I'll get 3 rather than 2,
is that right? So if I defect--when Edwina
defected: here's Edwina defecting in this period--she
got a payoff of 3 rather than the payoff of 2 she would have
got from cooperating. So the temptation here is just
3 - 2 and let's be clear, this is a temptation today and
we want to compare this with the value of the reward minus the
value of the punishment, but the key observation is that
these occur tomorrow. So since they occur tomorrow we
have to weight them a little bit lower.
So in general, the way in which we're going to
weight them tomorrow is we're going to discount them just like
we did in our bargaining game. We're going to weight
tomorrow's payments by δ, where δ
< 1. Now why is δ < 1?
Why are we weighing tomorrow less than payment today?
Why are payments tomorrow worth less than payments today?
Because tomorrow might not happen.
There are other reasons why, by the way.
It might be that we are impatient to get the money
today. Edwina just wanted the payoff
in a hurry, or it might be that she wanted to take the payment
today and put it in the bank and earn interest.
There are other reasons why money today might be more
valuable than money tomorrow, but, in games,
the most important reason is: tomorrow may not happen.
By tomorrow you might be dead, or, if not dead,
at least Jake's thrown two heads in the coins.
So δ is less than 1 because the game may end.
Now, what's the value of the reward?
The value of the reward is going to be the value of C "for
ever," but you want to be careful about "for ever."
It's C for ever, but of course it isn't really
for ever because the game may end.
So by "for ever" I mean until the game ends.
Let me be a bit more careful actually, it's (C,
C) isn't it? The value of (C,
C)--(cooperate, cooperate)--for ever.
Here we're going to have the value of (D, D) for ever.
And once again, the for ever here means until
the game ends. So this is the calculation
we're going to have to do. We're going to have to compare
the temptation, that was easy,
that was just 1 with the discounted difference between
the value of cooperation and the value of defecting.
Let's do the easy bits now, and then we'll leave you in
suspense until Wednesday. So let's do all the easy bits.
So what's this δ in this case?
In this particular game what was the probability that the
game was going to continue? What was the probability that
the game was going to end? The probability of it ending
was .25, so δ here was .75,
that's easy. The second bit that's
relatively easy is what's the value of playing (D,
D) until the game ends? Once people have cheated you're
going to play D for ever--here we are: Edwina's cheating here.
You're going to get (D, D) in this period,
(D, D) in this period, and so on and so forth until
the game ends. In each of those periods you're
going to earn 0, so this is just 0.
Which leaves us with a messy bit: what's the value of
cooperating forever? Let's try and do it.
We've got one minute. Let's do it.
So in every period in which we both cooperate what do we earn?
Throughout the beginning of the game: we cooperated in the first
period; now in the second period,
we cooperate again. What payoff do we get from
cooperating again? We get 2 and then Jake tosses
his coin and with probability δ we continue and we're
going to cooperate again. So with probability δ
we cooperate again and get what payoff the next period?
2 again, and then Jakes tosses the coin again,
so now he's tossed the coin twice,
so with probability δ² we're still playing
and we get 2, and then Jakes tosses the coin
again and it comes up other than heads,
heads again, that's with probability
δ³ we get 2 and so on.
So your exercise between now and Wednesday is to figure out
what the value of cooperation forever is: figure out this
equation and find whether in fact it was an equilibrium for
people to cooperate. We'll pick it up on Wednesday.