Tip:
Highlight text to annotate it
X
Here we are in week five of Social Network Analysis.
This week we're going to be looking at small world networks.
The concepts this week are going to be combining what we learned about random
graphs, in particular, error differing you random graphs.
What we found interesting about those is that those can really grow.
You can get networks of millions of nodes and yet, the average shortest path, which
scales just as the logarithm of the number of nodes will then be, you know, something
on the order of ten and that seems quite remarkable.
Then we looked at some real world networks and discovered that they had interesting
community structure that is on somewhat more, on a somewhat more localized scale,
their, the structure is not random. But we haven't yet really delved into the
very local structure. No, the structure of the network around
the particular node. So what we're going to be doing this week
is filling the gap. You know, is it possible for real world
networks to have small world properties, like random networks do and at the same
time, can we recognized the interesting structure they have at the local, you
know, at the local level. So just to outline, I'm going to start
with Milgram's famous small world experiment which established that it's a
small world, when you look at social networks.
Then I'm going to be looking at some local structure and metrics, starting with the
clustering co-efficient and then delving into motif analysis.
I'm going to then present some models, the very famous small world model by Watts &
Strogatz, that reconciles networks having clustering and at the same time having
short path lengths. And then, we're going to look at a few
other small world, models, having to do with geography or having to do with
hierarchical groups. And finally, we are going to look at, you
know, now just take these small world networks for granted, but say, where does
the structure arise from? You know, what, why do we get small world
networks? And then, next week, we'll be looking at
some of the consequences. What does small world structure meant for
diffusion, for learning and for coordination?
So, you know, that's still to come. Okay. On to Milgram.
Stanley Milgram is a Harvard sociologist who wanted to know how many hops is it
between any two individuals and the United States.
So he recruited, I think he put some,
An add in the newspaper. Individuals in Nebraska and actually in,
in Massachusetts as well, but the Nebraskans are the more interesting sudd
and he gave them a task. He said here's this stockbroker in
Massachusetts. I want you to mail this pamphlet to him,
But you can't do it directly, you can't just you know, stick it in the, in the
mail. You're supposed to mail it to someone that
you know, whom you know on a first name basis.
And they're going to mail it to someone who they know on a first name basis.
And we're going to see how many steps it's going to take.
Back then, when he asked people, how many steps they thought it would take, most of
them guessed hundreds, Because they were thinking of their local
network and most of the people they knew were rather local.
And so, Masschusetts seemed awfully far way and they just imagined that it will
take lots and lots of steps. That turn out to not to be the case at The
average number of steps was 6.5 and this how the expression Six degrees of
separation was coined. This experiment was repeated in 2003 with
e-mail and the task was very similar but you had eighteen targets instead of one
target, thirteen different countries and you had far larger initial participation.
So they got 60,000 participants into the experiments.
24,000 message chains, but only fewer than 400 actually made it to the target versus
Milgram had about a, you know, a twenty percent completion rate.
And here, the average risk path which span the globe was really short; it was just
four steps. So, let's think about whether six or, or
four. We'll see how four is not the right,
number really, how you can correct for it, Whether that's surprising?
So, in order to figure that out, first we need to know on average how many
individuals does some one have in their network.
Pool and Kochen in 1978, they did a study and estimated that the number's around 500
to 1500 acquaintances. So let's be conservative, and let's say
it's 500. And let's also imagine that we're in a
random network and so your friends' friends are just randomly chosen.
So they're very unlikely to overlap with your friends.
Now, if everyone has 500 friends, the average person would have how many friends
of friends? Okay.
So, if you have 500 friends and each of them has 500 friends in turn.
You would expect 250,000 friends of friends.
Pretty neat, right? As long as the network is random.
Now, let's extend this to friends of friends of friends.
Okay. In this case we have 500 to the third power and that, it means that, you
know, once you're three hops out, you have 125,000,000 friends of friends of friends
and that starts to resemble a significant portion of the U.S.
Population. So, the conclusion is that if networks are
random, Then six is not a surprising number at
all. In fact [laugh], maybe it's a little bit
on the, on the large side. What if, on the, the other hand, networks
were completely cliquish, right? So, all of my friend's friends are my
friends as well. They're not allowed to have any other
friends outside of my friends. Well, in that case, what would be true?
So, in that case, you actually can't reach any other part of the network, because,
the, you're in a complete clique and no one is allowed to have any edges outside
of that clique. So, your distance one, to anyone within
the clique, but you can't reach, its like infinite distance to anywhere else.
So now the question is, The small world phenomenon, how does it
arise, right? It seems very natural in a random graph,
but in an entirely cliquish graph, we're not sure, oh, oh, whether it can arise.
But first, let's talk about the accuracy of the number six or four, depending on
which experiment you're talking about. One reason why this isn't necessarily an
accurate number is that not all the chains are completed.
And the question is, are shorter chains more likely to be completed than longer
ones? And one consideration is, what is the
attrition rate? And also, is that attrition rate uniform,
with respect to, you know, whether that person is the next one in the chain from
the starting person. They might approach it with greater
enthusiasm because they were recruited. So the initial person was recruited.
They might be interested. They might tell, you know, they might
write the next person and say, this is such a cool experiment, you know, try it
out, but then that person may be passing on with less enthusiasm.
Similarly, maybe the people further down in the chain are actually getting closer
to target so maybe they are less likely to drop the message,
Because you know, the goal might be near. However, neither of those turns out to be
true. You have pretty stable attrition, no
matter, you know, which step in the chain you are.
So now, we can start to calculate the probability that a message gets through
given a constant attrition rate. So let's say it's, it's a half probability
that someone's going to pass the message versus just not do anything with it.
And so, if you have a chain of legnth two, so one, two, You have only this, this
person's in question, right, this, the initial center sends for sure, otherwise
there's no chain and we don't count it. And the recipient receives it so they
don't have to take any further action. So here we're just wondering, you know, is
this person going to pass it on. So let's calculate the probability that a
chain of length two gets through and then a chain of length five.
So, you should see that a chain of length two is much more likely to pass through
than a chain of length five. And so, you can then go back and try and
infer what the true distance distribution is.
So for example, for the email experiment, their average number of hops was four, But
once they corrected and said well the longer chains were less likely to get
through, They had an estimate of a median chain
length somewhere between five and seven, sort of recovering that six degrees of
separation again. A second factor in how accurate that
number is, is that individuals actually may not be finding the shortest paths,
right? [laugh] So, they don't know what the
global structure of the network. All they're doing is, they're passing the
message on to someone they think is more likely to know the target.
And there was this nice study in 2005 that mapped out completely a network within an
organization, So they, they had that map and then they
had those individuals. So the was who talks to whom,
Right? And then, they ran Milgram's] experiment,
which said, you know, past this to someone you normally talk to and they saw that
roughly half the time the individuals routing the message choose the next node
to be on the shortest path and half the time they goofed in away.
They, they didn't choose the shortest path.
So. Even for the successful chains they may
not actually lie on the shortest path. So, six might be an overestimate.
Now, recently, there was a study done at Facebook showing that any two Facebook
users are, you know, 4.7 hops apart. And the question is, what, what does that
really mean? You know, if you look at these different
social networking services, like Facebook, like Twitter, like LinkedIn.
And I would definitely recommend listening to the interview with LinkedIn,
Because this, this question is addressed in there or, or Google+, you know?
What direct and indirect benefit do you get from someone who's one hop away, two
hops, three hops, etcetera? So just something for you to think about.