Evaluating Similarity Measures - A large - Scale study in...

ELLEN SPERTUS: The main problems in computer science is that there's too much information out there. The amount of information grows exponentially at least, but our ability to handle information doesn't. And the specific problem looked at is how do we help users of social website find information that interests them? Specifically how can users of Orkut find communities-- that's the term that's used for discussion groups, like Usenet groups or Yahoo groups, or how can people find other users they might want to meet. So I was able to start working with Orkut shortly after it launched, and this shows in the first ten months, there was exponential growth of members. And a low exponential, but still exponential growth of these communities. Anybody could create them. They weren't very organized. So the question is, how do you help people find communities of interest to them? I wanted to be able to have current community rank recommendations, where if you were viewing a certain community, you would be told what other communities you might be interested in. And here's an example of some recommendations. Any guess of what community this is for? AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: I see some nods. Anyone want to guess? AUDIENCE: Unix programmers. ELLEN SPERTUS: OK, not UNIX programmers. This is for the geeks community. And the way these similarities were figured was with implicit collaborative filtering. What collaborative filtering means is it is a distributive approach, where we look at lots of people's behavior. And it was implicit, because we weren't asking people to rate anything explicitly. We were just observing their behavior, specifically which communities they joined. And we worked from the premise that two communities would be similar if they had lots of members in common. So here's some terminology I'll be using. We consider each community to be a set of members. We might talk about a base community, B, Wine, for wine lovers, and a related community R, like Linux. You'll see later why I use that. And what we want is a measure of similarity. How similar? How good a recommendation is R for B. And one thing you want to look at it is how does the membership between the two groups overlap. Are there a lot of people in the wine community who are also in the Linux community? So, here's an example, We have the pizza community. And two communities. I'm not showing the names up with with these different overlaps. Which one's a better recommendation. AUDIENCE: I would recommend pizza because it's [INAUDIBLE] ELLEN SPERTUS: Ah, OK, so someone in the audience said she'd recommend pizza to the people in the upper left. AUDIENCE: Upper Right. ELLEN SPERTUS: Oh, upper right. Excuse me. I was actually asking the reverse question. So if the base community is pizza, should we recommend the one on the right or the one on the left? Yeah? AUDIENCE: I'd say the one on the right, because even though it's a smaller group, you have a larger portion of the population. ELLEN SPERTUS: OK, so-- AUDIENCE: With me, it's a circular group. ELLEN SPERTUS: Yes. Someone very clever in the audience said he'd recommend the group on the right, because even though it's a smaller overlap, it's a bigger percentage. So the obvious answer which this audience did, when it did the one with bigger overlap, which would have been Linus. At the time I did this research, Linux was the most popular community, so it had the biggest overlaps with just about everywhere. I told this to a friend of mine who worked at Amazon, and he said, oh, yes, the Harry Potter problem. So whatever book someone looks at, the one most people they purchase with it is Harry Potter, but iut's not very useful to always be recommending that. So group size is also a factor. I've shown that here as the base and related community. Also these relations are asymmetric. There are some similarity in measures in computer science that works both ways. A is similar to B as B is to A, but that's not the case here. So, if you look at the Stanford community, the Stanford class of 2006 community isn't a very good recommendation. Let me show you my terminology. The numbers in parentheses are the size of the groups. And what you see here is that there's an overlap of 47 members who belong to both of the communities. So for the Stanford class of 2006, if someone belongs to that group, but not the Stanford one, that might be a good recommendation. Or you could argue, that you could say no, that's just too obvious. So that starts the question of how do we tell which recommendations are best? So the relationship is possibly asymmetric. I didn't have a very theoretical computer scientist education, so I walked over to Google labs and search, asked, actually said, UW Grads, what formula do I use if I overlap the size of two groups to figure out the similarity? And everybody gave me a different function. And I'll go through the weeds, and what I decided to, and we got in some arguments about which ones would be best, but then decided let's implement them all and see. So the simplest one is L1 normalization. You can think of this in terms of vectors, where each member is a dimension, and there's a 1 in that you dimension, if that user belongs to that community, or, I find it easier to think of it as set notation. Just take the overlap between the two communities, and divide it by the product of their sizes. And suddenly to notice about this is this heavily penalizes large communities. If the community is twice as big, you better have twice as big an overlap. Another measure is L2 normalization. In the vector space, this would be the same as the cosign distance. And this is the same as L1 except for this square root sign in the denominator. So you take the overlap and you divide it by the square root of the product. So this penalizes large communities less heavily. Another measure that is used is mutual information. Specifically, I'm looking here at positive correlation, which is the upper left corner of this matrix. What this formula shows is how well membership in the base community predicts membership in the related community. And then there's another version, where we have negative correlation in the blue, which is how well does non-memerhisp in the base community predict nonmembership in the related community. Some of you will be familiar with Saltan's term frequency inverse document frequency measure, and that's used to say that two documents are similar if they have the same words in common. And we can use that but instead say two communities are similar if they have the same users in common. Another one-- this kind of made the most sense to me intuitively, was LogOdds which looks at how much likelier a member of the base community is to belong to R than a nonmember of the base community. So switching back to the Harry Potter, how much likelier is a purchaser of the Lion, the Witch and the Wardrobe to buy Harry potter than people who haven't bought the Lion, the Witch and the Wardrobe. That tells you whether to recommend it. This actually yielded the same rankings as L1. So, just for fun, we decided to invert it and see what we got. So we used this version. So going into this, we didn't know would there be significant differences among the measures, or would they all give similar rankings? Which one would users prefer? Which measures would be best? And would there be a partial or total ordering of the measures. So here's some recommendations for the I Love Wine community. You can see the I Love Wine community has 2,400 members. And in the upper left corner, you can see, it overlaps 33 members with Ice Wine, out of 51 members of the Ice Wine Community. And something to notice. Remember, I said that L1 heavily penalized large communities? And you can see that here because L1 is recommending small communities, and 2 is recommending larger ones. That's the one with the square root in the denominator. And then the other algorithms have bigger yet communities. So what's the best recommendation? I saw some people taking wine before, so I know some of you are wine lovers. AUDIENCE: [INAUDIBLE] It looks like Japanese food. ELLEN SPERTUS: OK, so someone said Japanese food. AUDIENCE: Red wine, if they don't already know about it. ELLEN SPERTUS: OK someone said red wine if they don't already know about it. AUDIENCE: Must be Linux, or you wouldn't mention it. ELLEN SPERTUS: Professor Lazowska thinks it's Linux. OK, and there's a lot of different things you can say. You could say the small group certainly because you might not have heard about them. And, of course, the only way to find out is to do an empirical test, to see which users prefer. So for an experiment, we precomputed the top twelve recommendations for each of the six similarity measures. And then we set up an experiment-- the exact details are in the paper-- so that when a user views a community page, we do a # on the community id and user id, and select a pair of measures, maybe L1, and at LogOdds to compare, and we interleave those, and then we track what does the person click on? And we only look at new users, because we had other recommendations in the past, and we didn't want those to influence it. OK, so the next question is how do you interpret the results? There were six possible cases. On the left I have somebody's status in the base community. You can be viewing the Geeks community even if you're not a member of it. So I divided the member and nonmember cases. That's the left column. Across the top, I have their relation to the community they click on. So the M is for if they were already a member of that community. N is for if they didn't belong to that community and they don't join it after clicking on it. J is if they didn't belong to it but they click and join it. So it's six possible cases. I have the big N's and little n's to make them easier to tell apart. So, which of these measures, which of these squares do we care most about? AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: OK, I'm hearing J. AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: Member to J? AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: That's what we thought, too. That if somebody is a member of a base community, they had a community they didn't belong to and join it, that's the sign of a recommendation. And it's really unclear how to interpret the other ones. If somebody is a member of the community and they clicked on something and don't join it, does it mean it was a useful link because they clicked on it? Or was it just a distraction, that they clicked on it, and it wasn't what they hoped it would be. And you can ask similar questions about each of these squares. So we measured all of those, but we decided the primary feature would be member to join conversion. So we ran this experiment and we generated 4 million of these recommendation pages and we got 900,000 clicks on those links. And here I'm featuring the conversion rate. You see if somebody belongs to a community, and clicks on a recommendation, there's a 54% chance that they join the community that they click on. If they are in a community they don't belong to when they click on a link, there's only a 17% chance that they join it. Overall it's a 34% chance. So for the analysis, we want to take each click, and remember, each click questions was choosing among two different algorithms, and if L1 rated that item more highly that the user clicked on, we give a point to L1. If LogOdds rated it more highly, we give a point to LogOdds. And doing that, we actually got a total order on our results. So clicks leading to joins, L2 gave the best results. L! was next to worst. And all of these were statistically significant, except for the single arrow, as opposed to the double arrow. AUDIENCE: So these are member or overall? ELLEN SPERTUS: These are just looking at member of base community joins a community, in the top row. Excuse me, clicks leading to join. That's ambiguous. I'll have to to check my numbers If you look at all clicks, L2 wins. The order's the same, except that L! jumps forward. So, after this we were wondering about positional effects. We show things in scored grid of the positions that people are more likely to click on. Our first experiment couldn't tell us anything about that because we were putting the best recommendations first. So what we did is we generated new recommendations, just using L1, and we showed them to different users in different orders. And then we tracked a million and a third clicks, to see where people were more likely to click. So we show one, two or three rows of recommendations. When we showed a single row, anyone want to guess where people were most likely to click? Don't pay too much attention to these pictures because someone else would see a different order. So any guesses? AUDIENCE: Center. ELLEN SPERTUS: The center? AUDIENCE: First. ELLEN SPERTUS: First. At first left to right, we have a lot of Iranian users. Basically-- AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: Let's see. OK. Yeah, the center got the most. And the leftmost got next, although this was not statistically significant. AUDIENCE: Did you ever test for a size of image? ELLEN SPERTUS: I did not test for size of image. Nor did I separate out the different language speakers. There's all sorts of interesting things that can be done. But since we were doing random orders, the images would cancel out. OK, what about for two rows, and here's another question. What are these the recommendations for? AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: Sorry. AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: Of UW students, no? AUDIENCE: [INAUDIBLE] You took computer science. ELLEN SPERTUS: Actually Washington State. These are the Washington State recommendations. And what position do you think people are most likely to click on? Again don't pay too much attention to these pictures. AUDIENCE: Bottom ELLEN SPERTUS: I guess the bottom center. AUDIENCE: Upper right. ELLEN SPERTUS: Upper right. Upper left. Top center. OK, the top row got more. The top right actually got the most than the top center. And this was highly significant statistically. Going to three rows. This is for a fantasy and science fiction book club. Any guesses what people did here, where they clicked the most. OK, someone said middle. AUDIENCE: Upper. ELLEN SPERTUS: Upper. AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: Middle right. OK, here's what we found, that upper left actually got the most. AUDIENCE: [INAUDIBLE] ELLEN SPERTUS: Now again, it doesn't matter what those pictures are. Yeah, so someone theorized when there's so much information, people just stop at the first one. Although, I can't tell you why. I think its probably what happened. We got a lot of reactions from users about these related community recommendations. We got hundreds of emails a day. We had something that they could click on requesting that we add certain recommendations. And we got a lot of angry emails from people who created communities. AUDIENCE: [LAUGHTER]. ELLEN SPERTUS: So some are angry because here we were adding something to their community page that they didn't have any control over . And then people were angry about some specific recommendations. From recommendations for communities that they did not think were similar or tasteful or appropriate. And I sympathized with them. We heard from them about amusing recommendations. So for the C++ community, we recommend a community for people who don't understand women. AUDIENCE: [LAUGHTER[ ELLEN SPERTUS: And for chocolate, we recommended PMS. AUDIENCE: [LAUGHTER] ELLEN SPERTUS: Not all the recommendations were amusing. I created the years group and see if you can guess which link I didn't like. AUDIENCE: LINUX chicks. ELLEN SPERTUS: So, it was this one. So I actually implemented the feature to allow community owners to delete a recommendations. In Google, we do not do things by hand. The Google community owner's unhappy, she needs to fix it for everybody. Of course, she can fix it for her community. So we had a feature where someone could, a community owner could remove recommendations at their own. And evidently, I wasn't the only person, then. This was the first one that was removed. I wasn't the only person to want this. In just over a week after it was released, 60,000 recommendations were deleted and 260,000 were added. So this was a very popular feature. And an open question is how do these compare with automatic recommendations? Is it more useful to the users. It makes the community owners much happier, but is this benefiting the users. In some cases, people deleted communities that they saw as competing with then, or were very similar to them. So that would be interesting to find out. There's many possible future research areas. One idea would be just flipping things around, and instead of determining similar communities based on common users, figuring out similar users based on common communities. And one question would be is it useful? So, for example, there were a total of nine users who belong to these three communities that I belong to. So how could that be used. It can tell me you might want to meet these people. They share some of your interests. Or someone looking at my page. Orkut is mostly used for dating. Interested in Ellen. Sorry, she's married. Try Jessica. And even though it was based on a social network, I didn't really make use of that information. You can imagine, taking into account distances in the social network. We have demographic information. Should we count people from the same country, Should we use that more in recommendations? Should we use it less? What about different ages? Brazil was and continues to be most popular country. So you'd often get recommendations for communities in languages you might not speak. So I'd like to acknowledge my co-authors, Mehrun Sahami, and Orkut B. and the Orkut team, pictured there. It's grown a little bit larger now, but they kept the site running while I got to do this fine experiment. And for more information, I have the paper up here and I'll be making it available. You can also get it and other papers from labs.google.com/papers. I wrote a kind of silly article about for Orkut media. It's referenced in the paper. The URL is there. I wrote a number of silly columns on Orkut data mining, because we had the data I wanted to test. Are blondes really-- on Orkut, you can rate, how sexy, trusty, cool people are. So we had the data. I wanted to find out scientifically, are blondes sexier? Are gay people more cool? So you can find out my results about that. And, also Google is hiring, including at this Kirkland office. They wanted [INAUDIBLE] a whole lot. So you can find out more about that by talking to any Googler here at google.com/jobs. And I just want to let you know about next week's talk, which Alma Whitten will be giving, and that's a week from today. And now I'd be happy to answer any questions. Yes. AUDIENCE: So yes, when you were considering the problem of common membership in two different sets or two different groups, rather, basically, two users who had common membership in two groups would contribute the same no matter what. Did you consider by any chance, like giving more points to one who was more active in both groups,for example? ELLEN SPERTUS: That's a very good question. The question was you treated two users belonging to the same two groups the same. Did you think about weighting it, by which users were more active. That that would count as a stronger link toward a stronger recommendation. Something else you could consider is if someone belonged to fewer communities, If there's someone who belongs to a thousand communities maybe you should weigh each of those connections less than someone who just belongs to four communities and participates in all of them. We did not do that. Our hypothesis was we'd be able to get good enough recommendations doing things this way. An I think that was the case. But, I think that would be valuable. Yes. AUDIENCE: How about, going back a little bit. As a user, the user experience-- one of the difficulties of getting a recommendations is it takes up a fair amount of real estate and time to review them, and they're potentially useful, but perhaps is there was a prefiltering ahead of time, so that the user could say these are areas I'd like to see recommendations in, instead of arbitrarily selecting 2 or 9 out of 10,000 Have you ever though about doing it that way where the user has a little bit more [INAUDIBLE]? ELLEN SPERTUS: OK, so the question was about giving the user more ability to specify what they'd like recommendations about. And that would certainly be a good thing to do Since this was expensive to compute, we needed to precompute everything. And the total number of communities by the total number of communities is already a pretty big number, so we left it at that. Something that would be very interesting to do would be per user recommendations. In fact, that's something that Amazon has. They have these per item recommendations. When you're looking at one book page, you might like this other book. But they also have recommendations personalized for users. And if one of my counterparts at Amazon wanted to give a talk about that. How to compute efficiently. I'd find that very interesting. I'm not sure I've fully addressed your questions. AUDIENCE: No I was thinking about instead of having 10,000, you divide those 10,000 user groups into maybe 30, 40 general areas, and then you've the one user group against by 20 out of your total population. ELLEN SPERTUS: OK so the question was about-- AUDIENCE: You could have technical. You could have consumer. You could have like popular media. You can have like political views and discussion. And then the user can say, well, I really don't care about popular media or music but I would like to see what other technical recommendations you have. ELLEN SPERTUS: OK, so the question was to be able to group the recommendations by category. AUDIENCE: So the user can select which category. ELLEN SPERTUS: So the user can select what category your interests and so they might not be interested in politics but maybe they're interested in food. As you can see, some of those relations were crossed lines. You know someone likes pizza and you recommended LINUX. That could be a bad recommendation. One thing we considered was clustering, where we divide things up into groups. And that would be a perfectly good thing to do, valuable. We just didn't do it. AUDIENCE: I worry a little bit about when you recommend here, you're going to be merging the membership of the groups. And so you won't have the distinct difference of why you want two different groups, when they then have all the same set of members and their conversations are going to tend to be the same, you've lost the advantage. Do you think about possibly disadvantage to the community with these recommendations? ELLEN SPERTUS: OK, the question was, with these recommendations, this will make communities have more similar memberships with each other, so it might reduce diversity. That different communities would become more similar. Well, It'll also tend- there will be a feedback effect with the recommendations. Because if you start by saying these two communities are kind of similar because they have some members in common, then they're both features on each other's page. They're going to have more and more alike. So that is something that could happen. And it's not something that we've measured. In the back. AUDIENCE: So if someone is in one of these groups, would they find a very similar group on [INAUDIBLE]. Why bother generating it, [INAUDIBLE]? ELLEN SPERTUS: OK, so the comment was maybe you don't want to recommend something too similar, because people should be recommended something different from what they already have access to. And I'd say-- and we were measuring empirically, so it could be the similar is the wrong word. That when we made a recommendation that somebody clicked through and joined. Maybe it's not because it's similar, Maybe because it's something different. One example, would be people who go to the Wine community thinking it's about wines, not an emulator, the LINUX program. And if they go there for the wrong reason, then they might be happy to see a link to LINUX or to LINUX wine, which is a subject that isn't similar but it's what they wanted. So that's hard to measure. That's an interesting question. Other question. AUDIENCE: Like the Harry Potter problem. It's like you might sell a lot of Harry Potter books, but everybody already knows about Harry Potter. There's no point in telling them. So these smaller groups would be just more interesting to receive information. These other groups, you don't even know who's online so you know [INAUDIBLE]. ELLEN SPERTUS: OK so the comment is that even though you might sell books by recommending Harry Potter for everything, it's something people know about. It might be more valuable to recommend things they didn't know about, so I'm going to go back to the recommendations. So, with this I Love Wine community, where we have the small quirky communities that you might not now exist, Iced Wine, Pinot Noir, those are the smaller communities, and what we found empirically was that L1 produced some of the least popular results. But you could imagine why you'd be able to crank it to be able to ask for small communities or big communities. There's a sort of trade off between the likelihood that somebody will like something, and novelty. Are you going to present something that'll really excite them and is different from things that they knew about. AUDIENCE: All this, also is-- so this is somewhat of a tangent. But different communities behave differently at different sizes, so I was just wondering when you herd people go to these smaller communities, it's actually going to change them. And I was wondering if you saw any of that or had any way of measuring that at all? ELLEN SPERTUS: OK. So the point was that a community behavior depends on your size. If you start with a small community and you have a lot of people joining it, then that might change the community. And we did have problems with that. We had a user who was very unhappy because her feminine sexuality group, Cliturgy, was listed as a recommendation for a bunch of groups that she didn't think were appropriate, that she thought were merely obscene. So I don't know Google, so I'll put this online. So instead of having her feminist membership, she was having these people from this bigger group with different values from her join the community, and she shut down the community because of that. And when I found out about that, it may have been too late. She may have deleted it. But I talked with her and that's when I had it manually removed until we could fix it, because it was changing, it was changing the community, it was ruining the community, from her point of view. Of course, do we count the creator of the community as being more important that the users group, both with their clicks? Yes. AUDIENCE: So you said you added a control for community owners to control. What was recommended to the people viewing their page. This would be an example where the opposite permission basically, to control where your pages or where your group is recommended to. ELLEN SPERTUS: OK, so the comment was we allowed community owners to control the recommendations appearing on their page, why not let them control which pages their community was recommended on. And we did not do that, although it's something that the Cliteragician would have liked. But that, you'd just have to rely on people emailing community owners that there would be link swaps. So, my guess would be that if someone doesn't want to be listed, there's plenty of people who want to be in that position, who'll be asking for it. Something else we could have done would be let community members make suggestions about what should be related, but we decided to centralize it with the community owner, and users could email them. Yes? AUDIENCE: So the fundamental motivator to include this feature in the site, was that to-- what was the goal behind that? Why include this feature at all? ELLEN SPERTUS: OK, the question was why include this feature at all? What was our goal? And I'd say the goal was to help the users find the information of interest to them. AUDIENCE: Why? ELLEN SPERTUS: Why? Because we want them to use the site and to enjoy using it. So right now, people might create a community identical to one that already exists. At first, we made it so you couldn't call communities with the same name. That was done to prevent a landgrab. But I think we decided it wasn't a good decision. But we thought our users would be happier, get more benefit out of the site, if they were able to find communities of interest to them. And, at that time, there wasn't a good search mechanism. When someone created a community, they put it into one of 29 different categories. But when you have hundreds of thousands, and then millions of communities, that's not a good enough hierarchy. AUDIENCE: I want to know how long this whole process from after you got back your multiple regarding the algorithm to use, and once you made a decision to cut and go, how long did this whole process take? ELLEN SPERTUS: OK, the question was how long did this process take? It was pretty fast. It was over a period of months. The date service hitting the paper. This was done during Orkut's first year, 2004. And I was working on other projects, and we had to wait for the people to click, so swapping in different processes. But Orkut, the team, was kind of in a start-up mode, so we were able to push things quite quickly. AUDIENCE: So on that chart, where would you see the effect of this project? Would it show up, at which point in that chart? Would this have an effect-- this went out to everybody, right? ELLEN SPERTUS: Right. The recommendations went out to everyone. The question is where does it show up in the chart. We did the first experiment in July. It's not clear to me that this had a significant effect, so I guess, I mentioned the possibility that if people could find related communities, they might not create new ones. And the data does not bear that out. AUDIENCE: Size of community though, might be effected. ELLEN SPERTUS: OK. The comment was that the size of community might be effected. Yes. AUDIENCE: You mentioned you tweak the recommendations. How often do you do that? ELLEN SPERTUS: Well, for the experiment, we did it once. It would have been too difficult to measure if things were changing. And actually I'm not responsible for that part of the system anymore, so I can't tell you how often it's done now. But since community owners have been able to do the recommendations, that sort of fulfilling that hitch. What we implemented was something where we actually generated more recommendations that could fit, and we showed them to the community owner, and they could get rid of some, and get more suggestions. In back. AUDIENCE: You were talking about [INAUDIBLE] making the changes, getting [INAUDIBLE] how it acts now [INAUDIBLE] very different than when you first started [INAUDIBLE]. ELLEN SPERTUS: OK, the point was that even if it didn't affect community creation then it was a very good site than it is years later, and it could be that there'd be a different effect now. That's true. And it's just wonderful the different things that can be measured. It is great for me to be able to-- all these different researchers were claiming that this is the similarity measure that you should use or that one is. So there's a whole bunch of questions I wasn't able to answer, but I was able to do some comparisons. But you're right. There's all sorts more that can be done here. And I'm leaving you with more questions than I'm answering. But as I said, Googl's hiring. AUDIENCE: [LAUGHTER]. ELLEN SPERTUS: Yes. AUDIENCE: So to have these similarity measures, you expose your customers to a early set of the different results. Did you give any thought as to whether or not there can be a way to evaluate these offline, without actually doing a lengthy user basing experiment. ELLEN SPERTUS: OK, the question was we used our users to do this evaluation. Is there a way we could have done it offline? Do you mean like with hired evaluators? AUDIENCE: Right, is there a way to evaluate the goodness of a recommendation without [INAUDIBLE]. ELLEN SPERTUS: OK, is there a way to evaluate the goodness of a recommendation without subjecting users to this. I think no. I'm not an expert in evaluation. But I think the best measure of what's good for users is seeing what users do, what they click on. And maybe we could have done it for a shorter period of time. I don't know if we needed to do it as long, in order to get statistically significant result. But I think doing it as an actual user test with many clicks was valuable here. I think there was enough benefit to our users to justify it. All of them were getting recommendations. And we work at bringing out which measure we think give the best recommendations for our users, and then implement that and publish it. AUDIENCE: I find it interesting because there's a lot of research on our [INAUDIBLE] systems, that have various metrics of how good their results [INAUDIBLE] popular with actual users. So, I was curious, what's your opinion as to the the actual value of the end-user interaction versus a more academic approaches to evaluating recommender systems. ELLEN SPERTUS: The question was, for some evaluations of recommender systems, they don't think of what the users, the actual users do. They have some independent evaluation. And I just reviewed a paper, and criticized it for having separate evaluators. The evaluators can't know the intent of the person initiating the query. So I think it skews the results to hire people to do the evaluation. And I've done that, and I had someone giving evaluations for pages in a language they didn't speak. But even when the people are competent and trying their hardest, they just can't know what's most relevant. You know, we looked at those, which was better, ice wine or red wine. The only person who can decide that is the actual user. And that's my opinion. AUDIENCE: What about an approach like sampling some of the people that have actually joined the communities based on the recommendations and asking them more sort of qualitative questions about is this a community that you thought was good? Or are you actually participating in this community that you joined? Just to kind of get more of that. In privileging just at joining as the measure that says that this is successful, the recommendation is successful. And that's true in a certain sense but in the long term is that really usable. Is that person really just joining just lots and lots of groups? ELLEN SPERTUS: OK, that was a very good question. It was should we just be looking at communities people join? Wouldn't it be better if we could get some qualitative information from them about why they joined it, or see whether they keep using it. And I agree, those would the valuable. We didn't want to impose on the users, requiring more work for them. And while it would have been interesting to see what those people kept using the communities, the team was busy about keeping up with the exponential growth, or trying to, that we didn't measure it. That would actually be interesting. And we look closer at the data, we do see some interesting effects mentioned in the paper. There were some types of communities that people clicked through to a lot, that didn't join. And, I'll give a hit. If you join a community, its picture gets listed on your page. OK, so all of the communities with the lowest conversion rates were what's called adult. And none of the communities with the highest conversation rate were. So this is a case where people were clicking through. They have been clicking through the same community every day. They weren't joining it. So it was a good recommendation, but by just measuring joins, we weren't capturing that. Let's see, how're we doing on time? MAIL VOICE: Do you want to take a couple more questions? ELLEN SPERTUS: Yes. AUDIENCE: As she was asking about the motivation behind this, and you stated it was to get people to use the site more, did you find it that this was accomplished by this? In terms of usage per user? Did that increase after this? ELLEN SPERTUS: OK. the question was our goal was to help users get more benefit from the site, or use the site more, and did they, in fact, use it more. And we couldn't really measure it, because, well I taught about exponential growth, but I didn't understand it until I worked at Orkut, and it really grew exponentially. The sit would work great with 1,000 users, and then we'd get 10,000. And we'd have to fix some bottleneck and then it would work great, and so we had 50,000 users, so there was enough problem accessing the site, that how many page views users had wasn't under our control. OK. Yes. AUDIENCE: Have you considered any factor other than position on the page that effect the click-through rate, other than the matching reunion of the two membership part? What effects behavior, like as far as the size of the icon or the colors used or anything like that, did you test anything other than position? ELLEN SPERTUS: The question was if did we look at any factors other than position in click-through, like the size of the icon or the colors that were used. We did not do that. We could have done that. Some communities have no picture, and it just says no picture. And I didn't include any on the slides because they're not very interesting to look at. we didn't break those out separately. And see how much likelier someone is to click if there's a picture, if there's a good picture, if there's a big picture, if there's a menu colored, that's a good question. Not only are there openings at Google, I think there's actually openings on the Orkut team. AUDIENCE: [LAUGHTER]. MAIL VOICE: Alright, well let's thank Ellen.