5 1 5a Small World Experiments 1228

Here we are in week five of Social Network Analysis. This week we're going to be looking at small world networks. The concepts this week are going to be combining what we learned about random graphs, in particular, error differing you random graphs. What we found interesting about those is that those can really grow. You can get networks of millions of nodes and yet, the average shortest path, which scales just as the logarithm of the number of nodes will then be, you know, something on the order of ten and that seems quite remarkable. Then we looked at some real world networks and discovered that they had interesting community structure that is on somewhat more, on a somewhat more localized scale, their, the structure is not random. But we haven't yet really delved into the very local structure. No, the structure of the network around the particular node. So what we're going to be doing this week is filling the gap. You know, is it possible for real world networks to have small world properties, like random networks do and at the same time, can we recognized the interesting structure they have at the local, you know, at the local level. So just to outline, I'm going to start with Milgram's famous small world experiment which established that it's a small world, when you look at social networks. Then I'm going to be looking at some local structure and metrics, starting with the clustering co-efficient and then delving into motif analysis. I'm going to then present some models, the very famous small world model by Watts & Strogatz, that reconciles networks having clustering and at the same time having short path lengths. And then, we're going to look at a few other small world, models, having to do with geography or having to do with hierarchical groups. And finally, we are going to look at, you know, now just take these small world networks for granted, but say, where does the structure arise from? You know, what, why do we get small world networks? And then, next week, we'll be looking at some of the consequences. What does small world structure meant for diffusion, for learning and for coordination? So, you know, that's still to come. Okay. On to Milgram. Stanley Milgram is a Harvard sociologist who wanted to know how many hops is it between any two individuals and the United States. So he recruited, I think he put some, An add in the newspaper. Individuals in Nebraska and actually in, in Massachusetts as well, but the Nebraskans are the more interesting sudd and he gave them a task. He said here's this stockbroker in Massachusetts. I want you to mail this pamphlet to him, But you can't do it directly, you can't just you know, stick it in the, in the mail. You're supposed to mail it to someone that you know, whom you know on a first name basis. And they're going to mail it to someone who they know on a first name basis. And we're going to see how many steps it's going to take. Back then, when he asked people, how many steps they thought it would take, most of them guessed hundreds, Because they were thinking of their local network and most of the people they knew were rather local. And so, Masschusetts seemed awfully far way and they just imagined that it will take lots and lots of steps. That turn out to not to be the case at The average number of steps was 6.5 and this how the expression Six degrees of separation was coined. This experiment was repeated in 2003 with e-mail and the task was very similar but you had eighteen targets instead of one target, thirteen different countries and you had far larger initial participation. So they got 60,000 participants into the experiments. 24,000 message chains, but only fewer than 400 actually made it to the target versus Milgram had about a, you know, a twenty percent completion rate. And here, the average risk path which span the globe was really short; it was just four steps. So, let's think about whether six or, or four. We'll see how four is not the right, number really, how you can correct for it, Whether that's surprising? So, in order to figure that out, first we need to know on average how many individuals does some one have in their network. Pool and Kochen in 1978, they did a study and estimated that the number's around 500 to 1500 acquaintances. So let's be conservative, and let's say it's 500. And let's also imagine that we're in a random network and so your friends' friends are just randomly chosen. So they're very unlikely to overlap with your friends. Now, if everyone has 500 friends, the average person would have how many friends of friends? Okay. So, if you have 500 friends and each of them has 500 friends in turn. You would expect 250,000 friends of friends. Pretty neat, right? As long as the network is random. Now, let's extend this to friends of friends of friends. Okay. In this case we have 500 to the third power and that, it means that, you know, once you're three hops out, you have 125,000,000 friends of friends of friends and that starts to resemble a significant portion of the U.S. Population. So, the conclusion is that if networks are random, Then six is not a surprising number at all. In fact [laugh], maybe it's a little bit on the, on the large side. What if, on the, the other hand, networks were completely cliquish, right? So, all of my friend's friends are my friends as well. They're not allowed to have any other friends outside of my friends. Well, in that case, what would be true? So, in that case, you actually can't reach any other part of the network, because, the, you're in a complete clique and no one is allowed to have any edges outside of that clique. So, your distance one, to anyone within the clique, but you can't reach, its like infinite distance to anywhere else. So now the question is, The small world phenomenon, how does it arise, right? It seems very natural in a random graph, but in an entirely cliquish graph, we're not sure, oh, oh, whether it can arise. But first, let's talk about the accuracy of the number six or four, depending on which experiment you're talking about. One reason why this isn't necessarily an accurate number is that not all the chains are completed. And the question is, are shorter chains more likely to be completed than longer ones? And one consideration is, what is the attrition rate? And also, is that attrition rate uniform, with respect to, you know, whether that person is the next one in the chain from the starting person. They might approach it with greater enthusiasm because they were recruited. So the initial person was recruited. They might be interested. They might tell, you know, they might write the next person and say, this is such a cool experiment, you know, try it out, but then that person may be passing on with less enthusiasm. Similarly, maybe the people further down in the chain are actually getting closer to target so maybe they are less likely to drop the message, Because you know, the goal might be near. However, neither of those turns out to be true. You have pretty stable attrition, no matter, you know, which step in the chain you are. So now, we can start to calculate the probability that a message gets through given a constant attrition rate. So let's say it's, it's a half probability that someone's going to pass the message versus just not do anything with it. And so, if you have a chain of legnth two, so one, two, You have only this, this person's in question, right, this, the initial center sends for sure, otherwise there's no chain and we don't count it. And the recipient receives it so they don't have to take any further action. So here we're just wondering, you know, is this person going to pass it on. So let's calculate the probability that a chain of length two gets through and then a chain of length five. So, you should see that a chain of length two is much more likely to pass through than a chain of length five. And so, you can then go back and try and infer what the true distance distribution is. So for example, for the email experiment, their average number of hops was four, But once they corrected and said well the longer chains were less likely to get through, They had an estimate of a median chain length somewhere between five and seven, sort of recovering that six degrees of separation again. A second factor in how accurate that number is, is that individuals actually may not be finding the shortest paths, right? [laugh] So, they don't know what the global structure of the network. All they're doing is, they're passing the message on to someone they think is more likely to know the target. And there was this nice study in 2005 that mapped out completely a network within an organization, So they, they had that map and then they had those individuals. So the was who talks to whom, Right? And then, they ran Milgram's] experiment, which said, you know, past this to someone you normally talk to and they saw that roughly half the time the individuals routing the message choose the next node to be on the shortest path and half the time they goofed in away. They, they didn't choose the shortest path. So. Even for the successful chains they may not actually lie on the shortest path. So, six might be an overestimate. Now, recently, there was a study done at Facebook showing that any two Facebook users are, you know, 4.7 hops apart. And the question is, what, what does that really mean? You know, if you look at these different social networking services, like Facebook, like Twitter, like LinkedIn. And I would definitely recommend listening to the interview with LinkedIn, Because this, this question is addressed in there or, or Google+, you know? What direct and indirect benefit do you get from someone who's one hop away, two hops, three hops, etcetera? So just something for you to think about.